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CHAPTER 1: 


Overview of the IEA International 
Computer and Information Literacy Study 


2018 


Julian Fraillon, Sebastian Meyer, and John Ainley 


Introduction 


The International Association for the 


Evaluation of Educational Achievement (IEA) International 


Computer and Information Literacy Study (ICILS) 2018 investigated how well students are 
prepared for study, work, and life in a digital world. There is increasing acknowledgment across 
countries that with rapid advancement of new technologies it is important to develop the capacities 


of people to use information and communication technologi 
Commission 2018). ICILS 2018 focused on “the capacities 
for arange of purposes, in ways that go beyond a basic use o 


es (ICT) (see, for example, European 
of students to use ICT productively 
f ICT” (Fraillon et al. 2019, p. 1). ICILS 


2018 was based on and expanded the work of ICILS 2013 (Fraillon et al. 2014). 


ICILS 2018 included three main foc 
literacy (CIL) which was det 
communicate in order 


and 


society” (Fraillo 


Secondly, as an optional componen 


thinking (CT) w 
are appropriate 
to those prob 
201 
engaging challe 


ems so that 


9, p. 27). CT was not assessed 


to par 
net al. 2019, p. 1 


hich is defined as 
for computationa 
the solu 


nge for the studen 


fined as 
ticipate effectively at hom 
6). CIL was also assessed 
t 
the “ability to recognize as 


a problem into 


used to solve a problem. Thirdly, ICILS 2018 investigated the con 
CT are developed by collecting and ana 
ital devices by students and 
learning of ClL and CT insc 


twee 


th, an 
(4 


oS 


Th 


and out 


ICILS systematically reviewed differences among participating cou 
with regard to students’ CIL and CT, and how participating countri 
supported ICT-related education. It explored differences within an 
to the relationship between ICT-related learning outcomes, stud 
contexts. The outcomes of these reviews and analyses are reported i 


report (Fraillon 


LS was based around fo 
n and within countr 
ated to student achievement i 
d self-reported prof 
the aspects of stude 


ogical steps but a 


NOOIS. 


ies; 


nts’ 


e ICILS 2018 assessment framework ( 
research questions. The fran 


et al. 2020). 


teachers, and the resources available to support the teaching and 


ur research questions concerned wit 
2) aspec 
nClLand 
ciency in usin 
personal 


nework also provides greater detail re 
ines the variables necessary for analyses associated with 


us areas. Firstly, ICILS 2018 assessed computer and information 
“an individual’s ability to use computers to investigate, create, 
e, at school, in the workplace, and in 
in the first study cycle, ICILS 2013. 
for participating countries, ICILS 2018 assessed computationa 
pects of real-world problems whic 
te and develop algorithmic solutions 
ized with a computer” (Fraillon et al. 

in ICILS 2013. The assessment of CT was an innovative and 
ts, evaluating not only their ability to analyze and break down 
so assessing their understanding of how computers might be 
texts in which students’ CIL and 
he use of computers and other 


formulation and to eva 
tions could be operatio 


yzing data relating to t 


h: (1) variations in CIL and CT 
ts of schools, education systems, and teaching that are 
CT; (3) the extent to which students’ access to, familiarity 
g ICT are related to student achievement in CIL and CT; 
and social backgrounds that are related to CIL and CT. 


Fraillon et al. 2019) describes the development of these 
ating to the measured domains 
the research questions. 


ntries and education systems? 
es and systems provided and 
across countries with respect 
nt characteristics, and school 
nthe ICILS 2018 international 


qd 
e 


1 Education systems are units within countries with a degree of educational autonomy that have participated following 
the same standards for sampling and testing as countries. In this report, education systems are often referred to as 
countries for ease of reading. 


Instrument 
CIL test 


Students comp 
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s 


eted acomputer-based test of CIL that consisted of questions and tasks that were 


administered as five different 30-minute modules. Three of the CIL modules had been developed 


and used in IC 
collected in IC 


ILS 2013 and kept secure as trend modules. These were included to allow data 
LS 2018 to be equated with data from the previous cycle and reported on the 


CIL proficiency scale established for ICILS 2013. Consequently, it was possible to compare CIL 
achievement over time in those countries that participated in both cycles. Two new modules were 


developed for t 


software environments. Data collected froma 


he ICILS 2018 ClL test instrument to address contemporary thematic content and 
five CIL modules in ICILS 2018 were used as the 


Yn 


basis for reporting ICILS 2018 ClLresults onthe ICILS ClL achievement scale established in 2013. 


Each student completed two modules randomly allocated from the set of five available modules 


so that the tota 


assessment time for each student was one hour. Each of the assessment modules 


consisted of aset of questions and tasks based on a realistic theme and following a linear narrative 


structure. These modules consisted of a series of small discrete ta 


Wn 


s (typically taking less than a 


minute to complete) followed by a large task that typically took 15 to 20 minutes to complete. In 


total, the modu 


les comprised 81 discrete questions that generated 102 score points. 


When students began each module they were first presented with an overview of the theme 


nd purpose of 


tudents were 


e Bandcompetition (2013 & 2018): Students planned a website, edited an image, and used asimp 
website builder to create a webpage with information about a school band competition. 

e Breathing (2013 & 2018): Students managed files, and evaluated and collected information i 
order to create a presentation explaining the process of breathing to eight- or nine-year-old 


students. 


e School trip (2013 & 2018): Using online database tools, students helped plan a school trip and 


selected and 


the tasks in the module including a basic description of what the large task would 


a 

comprise. In the narrative of each module the smaller discrete tasks typically comprised a mix of 
skill execution and information management tasks that built towards completion of the large task. 
S 


required to complete the tasks in the allocated sequence and could not return to 


review completed tasks. 


The five modules measuring students’ CIL were: 


oo) 


DB 


adapted information to produce an information sheet about the trip for their peers. 


The information sheet included a map created using an online mapping tool. 


e Boardgames 


(2018): Students use a school-based social network for direct messaging and group 


posting to encourage peers to join a board games interest group. 


e Recycling (2018): Students access and evaluate information from a video sharing website to 


take researc 


identify a suitable information source relating to waste reduction, reuse, and recycling. Students 


to raise awareness about waste reduct 


hnotes from the video and use their notes as the basis for designing an infographic 
on, reuse, and recycling. 


In total, there were 20 different possible combinations of module pairs. Each module appeared in 
eight of the combinations—four times as the first and four times as the second module when paired 
with each of the other four. The module combinations were randomly allocated to students. This 
test design made it possible to assess alarger amount of content than could be completed by any 
individual student and thus ensured broad coverage of the content of the ICILS 2018 assessment 
framework. The design also controlled for item position effects on task difficulty across the sampled 


students and provided a variety of contexts for the assessment of CIL. 


OVERVIEW OF ICILS 2018 


CT test 


Studen 
based t 
ICILS 2018 and corresponded t 
conceptualizing problems and t 
theme and asequence of related tas 
inalarge task. Students completed t 
completed both the CIL test and the student 
score points derived from 18 discrete tasks an 
automatically scored. The exceptions were so 
trained scorers at each national center. 


ts inthose countries par 


e tasks in the CT module pri 
iverless bus. The set of tasks 
ormation and p 
lect data and dr 


the CT module focusing on opera 
coding environment compri 
functions and could be assembled in 
in farming. Students were required 
were presented with a work space, 
drone completing the command 
and actions required to solve the pro 
advanced through the modu 
that were available and the sequence of action 


Questionnaires 


The completion of the CIL assessment was fo 


characteristics, their experience a 


ticipating in the CT assessment comple 
est of CT that consisted of two 25-minute 
othe two strand 
he other to operatio 
s. Unlike th 
he two CT 


ec 
test 


q 
d 


m 


marily focused on planning digital solu 
included manipulating and interpreting visual representations of 
rocesses associated with behavior of the bus, and conf 
aw conclusions about the behavior of the bus under specified conditions. 


tionalizing solutions, students worked within a simp 
sing blocks of code that have some specified and some confi 


blem instance. The tasks were more complex as the s 
le. The complexity of the tasks related to the variety of code fu 
s required by the drone 


and out of school, and their attitudes towards 


Three further instruments were d 


schools: 
e A30-m estionnaire 
relating to 
in teaching f 
ticipation in 


nute teacher q 
followed by questions 
educational activities 
schools, and their par 


when teaching. 


in their sc 
gatheredi 


igned to gather information from and abou 


included some questions relating to teache 
teachers’ reported familiarity with ICT, thei 
ocused on a “reference class,”? their perceptions of ICT 
professional learning activities relating to t 


s of 


ues 
questions. Student responses to most tasks were 
eo 


lowed by a 30-minute s 
administered on computer. The questionnaire included questions relating to students’ background 
nd use of ICT to complete a range of different tasks in schoo 
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A 15-minute ICT coordinator questionnaire asked ICT coordinators about the resources available 
hool to support the use of ICT in teaching and learning. In addition, the questionnaire 
nformation about schools’ technological (e.g., infrastructure, hardware, and software) 


and pedagogical support as well as professional learning and hindrances to the use of ICT in 


school education. 


e A 15-minute principal questionnaire asked principals to provide information about school 


character 
ICT into teaching and learning. 


stics and then about school approaches to ICT-related teaching and incorporating 


The “reference class” was defined as the first target grade class taught by the respondent for a regular subject (i.e., 


other than home room, assembly, etc.) on or after the Tuesday following the last weekend before the respondent first 
accessed the questionnaire. See page 7 for a complete definition of the reference class. 


ICILS 2018 TECHNICAL REPORT 


An additional national context questionnaire was used to gather information from ICILS 2018 national 
centers about national contexts for and approaches to the development of students’ CIL and CT. 
This included information on policies and practices as well as on expectations and requirements 
for the pedagogical use of ICT. When answering this questionnaire, which was administered online, 
national centers were requested to draw on all available national expertise to provide the required 
information as well as to provide reference documents where appropriate. 


Computer-based test delivery 


ICILS 2018 used purpose-designed software for the computer-based student assessment and 
questionnaire. These were administered primarily using USB drives connected to school computers. 
After administration of the student instruments, the ICILS research team either directly uploaded 
data to aserver or submitted this information to national research centers for subsequent upload 
by national center staff. 


The teacher and school questionnaires were usually completed online (directly accessing aserver 
at IEA over the internet). However, respondents were also offered the option of completing the 
questionnaires on paper. 


Measures 
The CIL scale 


ClLwas defined as an “individual's ability to use computers to investigate, create, and communicate 
in order to participate effectively at home, at school, inthe workplace, and in society” (Fraillon et al. 
2019, p. 16). The ICILS 2018 CIL construct was conceptualized around four strands that framed 
skills and knowledge addressed by the ClLinstruments.° We used the Rasch item response theory 
model (Rasch 1960) to derive the CIL scale from the data collected from student responses to 81 
test questions and large tasks that generated 102 score points. Most questions and tasks each 
corresponded to one item. However, raters scored each ICILS large task against a set of criteria 
(each criterion with its own unique set of scores) relating to the specific properties of the task. 
Each large task assessment criterion can therefore be regarded as an item in ICILS. 


The ICILS ClL reporting scale was established in ICILS 2013, with a mean of 500 (the average CIL 
scale score across countries in 2013) and a standard deviation of 100 for the equally weighted 
national samples that met IEA sample participation standards in the first cycle. Plausible values 
were generated with full conditioning to derive summary student achievement statistics. 


The described scale of CIL achievement in ICILS is based on the content and scaled difficulties of 
the assessment items. The ICILS research team wrote descriptors for the expected CIL knowledge, 
skills, and understandings demonstrated by students who correctly responded to these items. 
Ordering the item descriptors according to their scaled difficulty (from least to most difficult) was 
used to develop an item map. The content of the items was used to inform judgements about the 
skills represented by groups of items on the scale ordered by difficulties. 


Analysis of this item map and the student achievement data were then used to establish proficiency 
levels that each had a width of 85 scale points with level boundaries set at 407,492,576, and 661 
scale points (rounded to the nearest whole number). Student scores below 407 scale points indicate 
CIL proficiency below the lowest level targeted by the assessment instrument. 


3 ClLwas described as comprising only two strands in ICILS 2013. Following an extensive evaluation of the ICILS 2013 
ClL construct and in consultation with ICILS national researchers the ICILS project team established a revised structure 
for the CIL construct for ICILS 2018. The restructuring of the CIL construct was undertaken to better communicate 
the contents and emphases of the construct and to minimize overlap across the aspects of the construct (Fraillon et al. 
2019, p. 17). 
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CT refers to an “individual’s ability to recognize aspects of real-world problems which are 
appropriate for computational formulation and to evaluate and develop algorithmic solutions to 
those problems so that the solutions could be operationalized with acomputer’” (Fraillon et al. 2019, 


p. 27). The CT construct comp 
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Given the limited number of CT tasks and score points, it was not possible to establish proficiency 
levels in the same way as for CIL. However, to provide a broad description of the underlying 


characteristics of achievement across the breadth of the sca 


e we divided the items ordered by 


their difficulty into thirds with equal numbers of items in each third. For ICILS 2018 we refer to 
these as the lower, middle, and upper regions of the CT scale. The descriptions of each region are 


syntheses of the common elements of CT knowledge, skills, a 
items within each region. The regions of the CT scale cannot 
inthe ClL scale, as t 
not comparable. 
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Measures based on the student questionnaire 
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Measures based on the teacher questionnaire 


A number of the measures based on the teacher questionnaire were also single-item indices. 
Such measures included experience with using computers for teaching purposes, and frequency 
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Measures based on the school questionnaire 


The school questionnaires (for the school principal andthe ICT coordinator) provided m 
policies and practices for using ICT, impedimen 
ICT in teaching and learning, and participation in teacher professional development. | 
tion about school characteristics, school contex 
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The twelve countries that participated in ICILS were: Chile, Kazakhstan, Denmark, the Republic of 
Korea (hereafter referred to as Korea, for ease of reading), Finland, Luxembourg, France, Portugal, 
Germany, the United States, Italy, and Uruguay. Moscow (Russian Federation) and North Rhine- 


Westphalia (Germany) 


ook part as benchmar 


ing participants. 
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he CT international option. 
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in most countries), provided that the average age of students in this grade was at least 13.5 at the 


time of the assessment. 
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The population for the ICILS teacher survey was defined as all teachers teaching regular school 
subjects to the students in the target grade. It included only those teachers who were teaching the 
target grade during the testing period and who had been employed at school since the beginning 
of the school year. ICILS also administered separate questionnaires to principals and nominated 


ICT coordinators in each school. 
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Italy decided to survey students and their teachers at the beginning of the school year while all other countries 


administered the survey at the end of school year. Their results were annotated accordingly in the international report. 
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Outline of the technical report 
This overview of ICILS 2018 is followed by 13 chapters. Chapters 2, 3, and 4, cover the instruments 
that were used in the study. Chapter 2 focuses on the development of the tests while Chapter 3 
provides an account of the computer-based assessment systems. Chapter 4 details the development 


of the questionnaires used in ICILS for gathering data 
school ICT coordinators. The chapter also provides an ou 
contexts survey, completed by th 


from students, teachers, principals, and 


tline of the development of the national 


e national research coordinators. An appreciation of the material 


in these chapters provides an essential foundation for interpreting the results of the study. 


Chapters 5 through 9 focus on the implementation of the survey in 2018. Chapter 5 describes 


the translation procedures and national adaptations used 
design and implementation, while Chapter 7 describes 
and documents the participat 
operations, is closely linked to Chapter 9, which reports on 
from the participating countries during the data collection. 


Chapters 10, 11, and 12 are concerned with datama 
the data-management processes that resulted in th 
details the scaling procedures for the CIL test, i.e., how the responses to tasks and items were used 
to generate the scale scores and proficiency levels. C 
the questionnaire items (student, teacher, and school questionnaires). The final chapter, Chapter 


in ICILS. Chapter 6 details the sampling 


the sampling weights that were applied 
ion rates that were achieved. Chapter 8, which describes the field 


the feedback and observations gathered 


nagement and analysis. Chapter 10 describes 
e creation of the ICILS database. Chapter 11 


hapter 12 describes the scaling procedures for 


13, presents an account of the analyses that underpinned the international report of ICILS 2018 


(Fraillon et al. 2020). 
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CHAPTER 2: 


ICILS 2018 test development 


Julian Fraillon 


Introduction 


The new content for inclusion in the ICILS 2018 assessment was developed over a 20-month 
period from April 2015 to December 2016. Most of this work was conducted by the international 


study center (ISC) at AC 


research partners. 


ER in collaboration with national research coordinators (NRCs) and other 


The ICILS 2018 assessment included two tests. The test of computer and information literacy 


procedures as well as 


by all students in countrie 
for further details of coun 


This chapter provides a 


The processes are releva 


the test 


Test scope and format 


CIL) was completed by all students and the test of computational thinking (CT) was completed 
s that elected to undertake this additional assessment (see Chapter 41 
try participation in ICILS). 


detailed description of the test development process and review 


design implemented for the ICILS 2018 field trial and main survey. 


nt for the tests of both CIL and CT and are consequently described 
together. Where relevant, details of the content of each test are described separately. 


Table 2.1 provides an overview of the test development processes and timeline. 


ICILS 2018 assessment framework 


The ICILS student tests of ClLa 
framework? (Fraillon et al. 2019). CIL was defined as an “individual’s ability to use computers to 
investigate, create, and communicate in order to participate effectively at home, at school, in 
’ (Fraillon et al. 2019, p. 16) and CT was defined as the “ability to 


the workplace, and in 


and to evaluate and d 
be operationalized wi 


evelop a 
thacom 


Each of ClL and CT a 


However, the described stru 


society’ 
recognize aspects of real-wor 


re descri 
overarching conceptual cate 


nd CT were developed with reference to the ICILS 2018 assessment 


d problems which are appropriate for computational formulation 
gorithmic solutions to those problems so that the solutions could 
puter” (Fraillon et al. 2019, p. 27). 


bed in the ICILS 2018 assessment framework in terms of strands 
gories) and aspects (specific content categories within strands). 


ctures of CIL and CT were not intended to presuppose a sub- 


dimensional analytic structure (see Fraillon et al. 2019 for further details). 


The CIL framework 


The following list sets out the four strands and corresponding aspects of the CIL framework. Full 
details of the CIL construct can be found in the ICILS 2018 assessment framework. 


e Strand 1: Understanding computer use, comprising two aspects: 


— Aspect 1.1: Foundations of computer use 


— Aspect 1.2: Computer use conventions 


e Strand 2: Gathering information, comprising two aspects: 


— Aspect 2.1: Accessing and evaluating information 


— Aspect 2.2: Managing information 


1 The framework can be downloaded from: 
https://www.springer.com/gp/book/978303019388 1 
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e Strand 3: Producing information, co 
— Aspect 3.1: Transforming informa 
— Aspect 3.2: Creating information 


e Strand 4: Digital communication, co 


- Aspect 4.1: Sharing information 
- Aspect 4.2: Using information res 
The CT framework 


The following list sets out the two stra 


e Strand 1: Conceptua 
— Aspect 1.1: Knowi 


mprising two aspects: 
tion 


mprising two aspects: 


ponsibly and safely 
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nds and corresponding aspects of the CT framework. Full 
details of the CT construct can be found in the ICILS 2018 assessm 


izing problems, comprising three aspects: 
ng about and understanding digital systems 


— Aspect 1.2: Formulating and analyzing problems 


- Aspect 1.3: Collecting and repres 


enting relevant data 


e Strand 2: Operationalizing solutions, comprising two aspects: 


— Aspect 2.1: Planning and evaluating solutions 
— Aspect 2.2: Developing algorithms, programs, and interfaces 


The ICILS test instruments 


The CIL test instrument 
The questions and tasks making up the 
each of which took 30 minutes to co 


allocated from the set of five. Three of the modules were secure tre 
2013 that provided a basis for reporting all CIL data collected in 
achievement scale that was established in 2013. Two of the modu 


inclusion in ICILS 2018. 


ent framework. 


ICILS ClIL test instrument were presented in five modules, 


mplete. Each student comp 


eted two modules randomly 
nd modules first used in ICILS 
ICILS 2018 on the ICILS CIL 
es were newly developed for 


A module is aset of tasks based on an authentic theme and following a linear narrative structure. 


Each module has a series of discrete 
complete, followed by alarge task that 


tasks, each of which typicall 
typically takes 15 to 20 minu 


of each module positions the discrete tasks as a mix of skill-execution 
tasks that students need to do in preparation for completing the large task. 


y takes less than a minute to 
tes to complete. The narrative 
and information-management 


When beginning each module, students were presented with an overview of the theme and purpose 
of the tasks in the module as well as a basic description of what the large task would comprise. 
Students were required to complete the tasks in the allocated sequence and could not return to 


review completed tasks. Table 2.2 incl 
including the large tasks. 


The ICILS CIL test modules included t 


e |Information-based response tasks: These tasks make use of the “di 


and-paper style questions in a slig 


short constructed response, drag an 


nowledge and understanding of Cl 
basic skills required to record ares 


or with minimal interactivity and the purpose of these tasks is to 


hree broad categories of tas 


htly richer format than trad 


ddrop), the stimulus materia 
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Lindependently of students u 


udes asummary of the five ICILS CIL assessment modules 


described below: 


gital interface to deliver pencil- 
tional paper-based methods” 


Fraillon et al. 2019, p. 46). The response formats for these tasks can vary (e.g., multiple choice, 


for these tasks is usually static 
‘capture evidence of students’ 
sing anything beyond the most 


ponse’” (Fraillon et al. 2019, p. 46). 
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Table 2.1: Test development processes and timeline 
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Year Month Group Activity 

2015 February ICILS international study center Establishment of CIL test specifications and preliminary CT 
test specifications 

2015 March First meeting of national Reflections on ICILS 2013 test of CIL 

research coordinators Review of proposed test development process and test 
(Krakow) specifications 
Test development workshop 

2015 arch ICCS international study center Drafting, review, and refinement of test modules 

2015 December National research coordinators Web-based review of test module storyboards (1) 

2016 January ICILS international study center Revision of test module storyboards 

2016 February Second meeting of national Review of draft test modules 

research coordinators 
(Amsterdam) 

2016 arch ICILS international study center Revision of draft test modules following second meeting of 
national research coordinators and pilot-testing of selected 
module content 

2016 April National research coordinators Web-based review of test module storyboards (2) 

2016 May ICILS international study center Revision of test modules and development of content in online 

IEA test delivery system 

2016 September | Third meeting of national Review of modules proposed for inclusion in field trial test and 

research coordinators (Porto) confirmation of test design 

2016 October CILS international study center Finalization of field trial test modules 

2016 November CILS field trial scoring trainers Review of field trial scoring guides for constructed-response 
items and large tasks (as part of scoring training) 

2016 December CILS international study center Revision of field trial scoring guides for constructed-response 
items and large tasks 

2017 July CILS international study center Migration of test content to online delivery system 

RM Results 

2017 August CILS international study center Analysis of field trial item data and recommendations for 
modules/items to be included in main survey test (field trial 
analysis report) 

2017 September Fourth meeting of national) Review of field trial analysis report and recommendations for 


research coordinators (Berlin) 


test design and modules/items proposed for inclusion in main 
survey 


2017 December ICILS main survey scoring 
trainers (Hamburg) 


Review of main survey scoring guides for constructed- 
response items and large tasks (as part of scoring training) 


2018 January ICILS international study center 


Finalization of main survey scoring guides for constructed- 
response items and large tasks 


14 


© Ski 


ICILS 2018 TECHNICAL REPORT 


lls tasks: Inthese tasks students “use interactive simulations of generic software or universal 


applications to complete an action’ (Fraillon et al. 2019, p.47). The number of steps required to 
complete a skills task and the number of different correct methods for executing a task varies 
across skills tasks. Linear skills tasks require students to execute one or more commands in 


ag 


iven sequence (such as copy and paste) whereas nonlinear skills tasks require students to 


execute a function involving more than one sub-command without a single given sequence (such 
as using the filter functions in an online database to locate information). 


e Authoring tasks: These tasks “require students to modify and create information products 


USI 


ng authentic computer software applications” (Fraillon et al. 2019, p. 49). The complexity of 


authoring tasks vary according to the number of different applications students were required 

to use, the range of viable solutions to the task, and the amount of information students were 

required to evaluate and make use of when completing the task. The authoring tasks were most 
m 


commonly the large task within each 
criteria applied by human scorers. 


odule and typically were scored using multiple analytic 


Further details of the ICILS CIL test modules, including example tasks, are presented in the ICILS 
2018 assessment framework (Fraillon et al. 2019) and the ICILS 2018 international report (Fraillon 
et al. 2020). 


Table 2.2: Summary of ICILS 2018 CIL test modules and large taskst 


Module 


Description and large task 


Band competition 
(also used in ICILS 2013) 


Students plan a website, edit an image, and use a simple website builder to create a 
webpage with information about a school band competition. 


Breathing 
(also used in ICILS 2013) 


Students manage files and evaluate and collect information to create a presentation to 
explain the process of breathing to eight- or nine-year-old students. 


School trip 
(also used in ICILS 2013) 


Students help plan a school trip using online database tools and select and adapt 
information to produce an information sheet about the trip for their peers. 
The information sheet includes a map created using an online mapping tool. 


Board games 
(new for ICILS 2018) 


Students use a school-based social network for direct messaging and group posting to 
encourage peers to join a board games interest group. 


Recycling 
(new for ICILS 2018) 


Students access and evaluate information from a video sharing website to identify a 
suitable information source relating to waste reduction, reuse, and recycling. Students 
take research notes from the video and use their notes as the basis for designing an 
infographic to raise awareness about waste reduction, reuse, and recycling. 


The CT test instrument 


The tasks making up the ICILS CT test instrument were presented in two modules, each of which 
took 25 minutes to complete. Each student completed both modules (in randomized order across 


modul 


imila 


trand 


tudents) after they had completed each of the CIL test and the student questionnaire. Both 


es were newly developed for inclusion in ICILS 2018. 


r to the CIL assessment, each CT test module contained a set of tasks linked by acommon 


arge task, in contrast, each module comprised a set of items associated with the processes of 


S 
S 
narrative theme. Unlike the CIL modules, the CT modules did not culminate in students completing 
a 
p ing and execution of computer-based solutions to real-world problems. 


f the modules, automated bus, focused on content associated with the conceptualizing problems 
of the CT framework. The narrative theme of the module related to planning aspects of 


rogrammable decision-making to be implemented in a driverless bus, such as route planning and 
ing at safe distances to avoid collisions. The tasks in this module included visual representation 


frea 
config 


-world scenarios (through, for example, path diagrams, flow charts, and decision trees) and 
uring, running, and interpreting results of a simulation tool. 
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The second module, farm drone, focused on content associated with the operationalizing solutions 
strand of the CT framework. The narrative theme of this module related to the use of visual code 
to control the actions of a drone used in farming. In the farm drone module students were able to 
return to earlier tasks to check and revise their responses. 


Like the CIL test modules, the CT test modules contain information-based response tasks and 
skills tasks. However, in addition to these, the CT test modules include task types that are unique 
to the CT assessment. The three CT-specific task types are described below. 


e Nonlinear systems transfer tasks: These tasks require students to “interpret, transfer and adapt 
algorithmic information so that the outcomes of the application of algorithmic instructions can 
be displayed visually” (Fraillon et al. 2019, p. 51). The response formats for these tasks can vary 
but what they have in common is that students are required to make connections between a 
visual representation of an algorithmic sequence and the steps of the sequence described as 
text. 


e Simulation tasks: These tasks require students to work with some form of simulation tool, 
typically as part of developing an understanding of real-world problems or to evaluate solutions 
to problems. The tasks can have students set parameters on the tool, run a simulation, collect 
data, and interpret the results. 


e Visual coding tasks: These tasks require students to manipulate visual code blocks that can 
be used to execute a range of actions. In ICILS 2018, these tasks focused on managing code 
blocks that could control the movement and some basic actions of a virtual drone. Students 
could assemble the code blocks in a work space, rearrange the code blocks, and configure some 
aspects of the code blocks (such as the number of repeats in a loop or the material dropped 
by drone). Students could also run the code at any time and view the activation of code blocks 
and the corresponding behavior of the drone. They could also separately reset both the drone 
and the code blocks/work space to their original states. There were two main forms of visual 
coding tasks: 


i. Algorithm construction tasks require students to assemble sequences of code blocks in order 
for the drone to execute a prescribed set of actions. 


ii. Algorithm debugging tasks require students to correct an existing flawed algorithm (provided 
to students as an editable configuration of code blocks in the work space) so that the drone 
could execute a prescribed set of actions. 


Further details of the ICILS CT test modules, including example tasks, are presented in the ICILS 
2018 assessment framework (Fraillon et al. 2019) and the ICILS 2018 international report (Fraillon 
et al. 2020). 


Test-development process 


The test-development process consisted of a series of stages applicable to the development of 
both the CIL and CT test instruments. Although these stages followed each other sequentially, the 
iterative and collaborative nature of the overall process meant that some materials were reviewed 
and revised within particular stages more than once. Insummary, the ISC developed item materials 
(sometimes based on suggestions from NRCs), which were reviewed by NRCs and then revised 
by the ISC. Sometimes this process was repeated. 


nICILS, each task or item comprises the stimulus materials available to students, the question or 
instructions given to students, the specified behavior of the computer delivery system in response 
to students’ actions, and the scoring logic (as specified in the scoring guides for human scoring) for 
each item or task. The test development and review process encompassed all of these constituent 
parts of the ICILS modules. 
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Drafting of preliminary module ideas 


in 


The f 
discu 


rst meeting of the ICILS 2018 NRCs, included a reflection on the CIL test from ICILS 2013, 
ssion of the potential for assessing CT in ICILS 2018, and a module development workshop 


which participants were introduced to the questions and issues directing the creation and 


evalu 
module development and review process, were applied by test development staff at the ISC and 
reviewers alike. The following lists present the main review questions used to evaluate the ICILS 
modules: 


ation of the modules and tasks. These review criteria, which remained valid throughout the 


Content validity 


C 


How did the material relate to the ICILS test specifications? 

Did the tasks test the construct (CIL or CT) described in the assessment framework? 

Did the tasks relate to content at the core of the aspects of the assessment framework or 
focus on trivial side issues? 

How would the ICILS test content stand up to broader expert and public scrutiny? 


arity and context 


— Were the tasks and stimulus material coherent, unambiguous, and clear? 


— Were the modules and tasks interesting, worthwhile, and relevant? 


Did the tasks assume prior knowledge and, if so, was this assumed to be acceptable or part 
of what the test intended to measure? 

Was the reading load as low as possible without compromising the real-world relevance and 
validity of the tasks? 

Were there idioms or syntactic structures that may prove difficult to translate into other 
anguages? 


Test-takers 


Did the content of the modules and tasks match the expected range of ability levels, age, and 
maturity of the ICILS target population? 

Did the material appear to be cross-culturally relevant and sensitive? 

Were specific items or tasks likely to be easier or harder for certain subgroups in the target 
population for reasons other than differences in the ability measured by the test? 

Did the constructed-response items and the large-task information provide clear guidance 
about what was expected in response to the items and tasks? 


e Format and scoring 


Was the proposed format the most suitable for the framework content being assessed by 
each task? 

Was the key (the correct answer to a multiple-choice question) indisputably correct? 

Were the distractors (the incorrect options to a multiple-choice question) plausible but also 
irrefutably incorrect? 

Did the scoring criteria for the large tasks assess the essential characteristics of task 


completion? 

Were there different approaches to provide answers with the same score, and did they 
represent equivalent or different levels of proficiency? 

Was the proposed scoring consistent with the underlying ability measured by the test (CIL), 
and would test respondents with higher ability levels always score better than those with 
lower ones? 

Were there other kinds of answers that had not been anticipated in the scoring guides (e.g., 
any that did not fall within the “correct” answer category description, but appear to be equally 
correct)? 
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— Were the scoring criteria sufficient for scorers and did they clearly distinguish the different 
levels of performance? 


At the module development workshop, NRCs were invited to discuss and suggest new ideas for 
module themes. These themes were taken as astarting point for subsequent module development. 


Development of module storyboards 


After the first NRC meeting, the ISC developed storyboards for six new modules (three for each 
of ClL and CT). 


The storyboards were presented in the form of a Microsoft PowerPoint mock-up of each task/ 
item in sequence. The tasks were presented in sequence so that those viewing the presentation 
could see the narrative sequence of tasks in the module. 


Each PowerPoint slide contained the stimulus material together with the task instructions or 
question that students would be required to respond to, and each storyboard was accompanied 
by a set of implementation notes for each task or question. The notes described the planned 
functionality/behavior of each task and provided instructions on how the task was to be scored. 
Instructions were provided not only for the human-scored tasks but also for the tasks that would 
be scored automatically by the computer system. 


First online review of module storyboards 
In December 2015, NRCs took part in an online review of the draft storyboards. When reviewing 
the draft storyboards, NRCs made recommendations relating to the modules. The NRCs’ feedback 
informed further revision of the module storyboards and preparation for detailed discussion of 
the modules at the second meeting of NRCs. 


Face-to-face review of draft module storyboards 


One focus of the second meeting of NRCs was to review the six draft module storyboards. 


Following this meeting, four module storyboards (two CIL and two CT modules) were selected 
for further development and revision. Feedback from the meeting was used to inform the further 
revision of the four module storyboards. 


Second online review of draft module storyboards 


Once the module storyboards had been revised following the second meeting of NRCs, the 
storyboards were made available online to NRCs for further comment. The feedback on these 
modules was used to finalize the draft storyboards to be enacted within the online delivery system. 


Authoring the draft storyboards in the field trial online delivery system 


The finalized storyboards were provided to IEA to be authored into the test delivery system, which 
meant they could then be viewed, in draft form, with their expected functionality. This process 
served two purposes: it enabled the draft modules to be reviewed and refined with reference to 
their live functionality and it enabled the functionality of the test delivery system to be tested 
and refined. 


Face-to-face review of field-trial modules and finalization 

Operational versions of the proposed field trial modules were made available to NRCs for review 
at the third meeting of NRCs. The content and functionality of the field trial modules were revised 
and finalized in response to this review. 
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Field-trial scorer training 

National center representatives attended an international scorer training meeting held before the 
field trial. These representatives subsequently trained the national center staff in charge of scoring 
student responses in their respective countries. Feedback from the scoring training process led 
to refinements to the scoring guides. 


Authoring the draft storyboards in the main survey online delivery system 


Following the field trial, the module content was provided to RM Results (formerly SONET Systems) 
to be authored into the main survey test delivery system. Further review and refinement of the 
content and function of the modules was conducted in this system. 


Field trial analysis review and selection of items for the main survey 


Field trial data were used to investigate the measurement properties of the ICILS test items at the 
international level and within countries. Having recommended which modules and tasks should 
be included in the main survey instrument, ISC staff discussed their recommendations with NRCs 
at the fourth ICILS NRC meeting. During the meeting, small refinements were recommended for 
a number of tasks. The NRCs also strongly recommended that all four test modules used in the 
field trial be retained for use in the main survey given that all four exhibited satisfactory validity, 
functioning, and measurement properties. 


Post-field trial revision 


inor modifications were made to a small number of tasks and the functionality of all tasks was 
further reviewed and refined. The main survey instruments, comprising five CIL test modules 
three trend modules, two newly developed modules) and two CT test modules, were then finalized. 


Main survey scorer training 


The main survey international scorer training meeting provided a final opportunity to reflect on 
the experience of scoring the field trial responses and to further review the scoring guides. The 
meeting was again attended by national center representatives who were responsible for training 
the national center staff in charge of scoring student responses in their respective countries. 
Feedback from the second international scorer training meeting along with student achievement 
data and the reported experiences of scorers during the field trial prompted further refinements 
to the scoring guides. 


Field trial test design and content 


Test design 


In this report we refer to tasks, items, and score points when describing the ICILS tests of CIL 
and CT. 


The term task refers to the instructions given to students and the actions required to complete them. 
As described previously, the ICILS test modules include a range of information-based response 
tasks, and skills, authoring, and coding tasks. 


The term item refers to the variable or variables derived using the scored student responses to 
each task that were used to create the scales of CIL and CT and to measure student achievement 
against them. Achievement on tasks, such as information-based response tasks, and skills and 
coding tasks, was typically measured using one item per task. Achievement on the authoring tasks 
was measured using many items (scoring criteria) per task. 


The term score points refers to the number of discrete non-zero score categories per item. The 
items associated with skills tasks typically elicited one score point each (e.g., correct = 1, incorrect 
=0) whereas most of the items associated with information-based response tasks, authoring tasks, 
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and coding tasks elicited more than one score point (e.g., full credit = 3, partial credit (high) = 2, 


partial credit (low) = 1, and no credit = 0). 


The CIL field trial test instrument consisted of five test modules with a total of 51 tasks which 
yielded data for 86 items (all authoring tasks and a small number of skills tasks were assessed 
using more than one criterion, with each criterion constituting an item). The selection of tasks, 
including their format, was determined by the nature of the content the tasks were assessing, 
the tasks’ potential range of response types, and their role in the narrative flow of each module. 


The ICILS research team had earlier decided not to have the same balance of task types and task 
formats within each module but rather to have amix across the five modules of information-based 
response tasks, skills tasks, and authoring tasks. Table 2.3 shows the composition of the field trial 
CIL test modules by items derived from the different task types. Overall, 30 percent of the items 
were derived from information-based response tasks, 22 percent from skills task, and 45 percent 


from authoring tasks. 


The CT field trial test instrument consisted of two test modules with a total of 21 tasks yielding 


data for 19 items (Table 2.4).2 


Table 2.3: Field trial CIL test module composition by items derived from tasks 


Band competition 1 2 2 2 3 7 17 

Breathing O 3 O 3 O 10 16 

School trip 3 O O 4 1 7 15 

Board games 3 4 O 1 O 6 14 

Recycling* 3 5 O 1 4 11 24 

Total 10 14 2 41 8 41 86 
Note: *The recycling module included a short note-taking task and a communication task. In this table both are 


classified as authoring (large) tasks. 


Table 2.4: Field trial CT test module composition by items derived from tasks 


Automated bus 1 1 4 2 O 0 8 
Farm drone 0 2 1 0 6 2 44 
Total 1 3 5 2 6 2 19 


2 Data for two tasks were not used in the field trial scaling analysis. 
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Table 2.5: Field trial test form design and contents 
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Field trial coverage of the CIL framework 


All field trial items were developed according to and mapped against the ICILS CIL framework. 
Table 2.6 shows this mapping. 


Table 2.6: Field trial CIL item mapping to the CIL framework 


CIL framework aspect Items Items Score Score 

(n) (%) points points 
(n) (%) 
1.1 Foundations of computer use 4 5 | 4 
1.2 Computer use conventions 9 10 8 
2.1 Accessing and evaluating information 16 19 24 19 
2.2 Managing information 9 10 1 9 
3.1L Transforming information 14 16 23 18 
3:2 Creating information 23 27 40 ol 
41 Sharing information 7 8 9 7 
4.2 Using information responsibly and safely 4 5 6 5 
86 100 128 100 


As stated inthe ICILS 2018 assessment framework, “[t]he test design of ICILS was not planned to 
assess equal proportions of all aspects of the CIL construct, but rather to ensure some coverage 
of all aspects as part of an authentic set of assessment activities in context” (Fraillon et a 
54). The intention that the four strands would be adequately represented in the test was achieved. 
Twelve percent of score points related to Strand 1, 27 percent to Strand 2, 49 percent to Strand 


3, and 


12 percent to Strand 4. These proportions corresponded to the amount of time 


stude 


2019, p. 


the ICILS 


nts were expected to spend on each strand’s complement of tasks. Aspects 2.1, 3.1, and 3.2 


were assessed primarily via the large tasks at the end of each module, with students expected to 


spend 


roughly two thirds of their working time on these tasks. 


Field trial coverage of the CT framework 


All fie 
Table 


2.7 shows this mapping. 


Table 2.7: Field trial CT item mapping to the CT framework 


d trial items were developed according to, and mapped against, the ICILS CT framework. 


CT framework aspect Items Items Score Score 
(n) (%) points points 
(n) (%) 
LA Knowing about and understanding 3 16 7 16 
digital systems 
12 Formulating and analyzing problems 3 16 5 12 
13 Collecting and representing relevant 3 16 4 9 
data 
2.1 Planning and evaluating solutions 4 21 10 23 
2.2 Developing algorithms, programs, 6 32 d7 40 
and interfaces 
19 100 43 100 
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le there was asimilar number of items addressing each of Strands 1 and 2 


ajority (63%) of score points collected re 
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Selection of CIL and CT test content for main survey 


As stated previously, the 
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survey test design and content. 
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Feedback from NRCs an 
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task’s measurement properties. After consu 
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included in the main survey instrument. Chapter 11 of this report describes the analysis procedures 
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d observers of the field trial suggested 


farm drone module wou 


use in the main survey. 


Main survey test design and content 


Test design 


The main survey test instrum 


emain 


ting with NRCs, the ISC decided to retain all five CIL 


survey. 


that the total testing time was 


ong for students completing both the CIL and CT tests and that many students were finishing 
ule in far less than the allocated 30 minutes. As a result of this, it was 
cated to each CT module would be reduced to 25 minutes and that the 
d be shortened by removing two final constructed response items that 
required students to provide reflections on coding solutions. 
drone module would benefit from an increase in the number of 
algorithm construction tasks from the field trial were converted 


t was also agreed that the farm 
algorithm debugging tasks. Two 
to algorithm debugging tasks for 


ent consisted of five test modules with a total of 46 tasks which 


yielded data for 81 items. Some of these tasks generated a number of score points based on the 


criteria that were applied to the large tas 


s. Table 2.8 shows the composition of the main survey 


test modules by items derived from the different task types. The items shown in Table 2.8 are those 


that were used in the scaling a 


nd analysis of the ICILS 2018 main survey ClL test data. Overall, 27 


percent of the items were derived from information-based response tasks, 22 percent from skills 


task, and 51 percent from authoring tasks. 


The CT main survey test 
yielded data for 17 items 


derived from skills tasks (Table 2.9). 


Students received the test modules in th 


instrument consisted of two test modules with a total of 17 tasks which 
. The CT test emphasized the application of skills with 15 of the 17 items 


esame fully balanced complete rotation that was used in 


the field trial. Table 2.10 shows this design for the main survey. As before, the term test form refers 
to each combination of modules used in the main survey. 
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Table 2.8: Main survey CIL test module composition by items derived from tasks 


Band competition 1 2 2 2 3 7 17 

Breathing O 2 O 3 O 10 15 

School trip 1 O 0 4 1 7 13 

Board games Z 4 O al 1 5 13 

Recycling* 3 5 O 1 2 12 23 

Total 7 13 2) 14 7 41 81 
Note: *The recycling module included a short note-taking task and a communication task. In this table both are 


classified as large tasks. 


Table 2.9: Main survey CT test module composition by items derived from tasks 


Automated bus 1 1 4 2 O O 8 


Farm drone 


Total | 1 1 5 | D 5 a 17 
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Table 2.10: Main survey test form design and contents 
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5 S 
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19 R S R: Recycling 
AB: Automated bus 
20 R G FD: Farm drone 


Main survey coverage of the CIL framework 


All main survey items were developed according to and mapped against the ICILS CIL framework. 
Table 2.14 shows this mapping. 


Table 2.11: Main survey CIL item mapping to the CIL framework 


CIL framework aspect Items Items Score Score 
(n) (%) points points 
(n) (%) 
1.1 Foundations of computer use 2 iz 2 2 
1.2 Computer use conventions 11 14 13 13 
2 Accessing and evaluating information 14 17 16 16 
2.2 Managing information 8 10 8 8 
3.1 Transforming information 15 19 20 20 
3.2 Creating information 21 26 31 30 
4.1 Sharing information 7 9 8 8 
4.2 Using information responsibly and safely 3 4 4 4 
81 100 102 100 
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A comparison of Tables 2.6 and 2.11 reveals that the final test instrument provided very similar 
CIL framework coverage to that of the field trial instrument. 


The 81 main survey items yielded 102 score points for inclusion in the item analysis and scaling. 
Overall 15 percent of score points were derived from items corresponding to Strand 1, 24 percent 
to Strand 2, 50 percent to Strand 3, and 12 percent to Strand 4. 


Main survey coverage of the CT framework 


All main survey items were developed according to and mapped against the ICILS CT framework. 
Table 2.12 shows this mapping. 


Table 2.12: Main survey CT item mapping to the CT framework 


CIL framework aspect Items Items Score Score 
(n) (%) points points 
(n) (%) 
11 Knowing about and understanding 3 18 7 18 
digital systems 
1.2 Formulating and analyzing problems 2 12 4 10 
13 Collecting and representing relevant 3 18 5 13 
data 
2.1 Planning and evaluating solutions 5 29 12 eal 
22. Developing algorithms, programs, 4 24 11 28 
and interfaces 


17 100 39 100 


A comparison of Tables 2.7 and 2.12 reveals that the final test instrument provided similar CT 
framework coverage to that of the field trial instrument. The relative decrease in Strand 2.2 
coverage and increase in Strand 2.1 coverage between the field trial and main survey instrument 
was largely a result of a conversion of two algorithm construction tasks in the field trial to become 
algorithm debugging tasks in the main survey. Other small changes in the coverage by aspect 
related to the removal of two constructed response evaluation tasks from the farm drone task 
following the field trial. 


The 17 main survey items yielded 39 score points for inclusion in the item analysis and scaling. 
Overall 41 percent and 59 percent of score points were derived from items corresponding to 
Strands 1 and 2 respectively. 


Scoring the main survey CT algorithm construction and algorithm debugging tasks 

CT items 

The CT test was new to ICILS and represented the first time CT has been assessed in a cross- 
national large-scale assessment content. The quality of students’ responses to visual coding tasks 
has not previously been assessed in this context and consequently we have included this section 
in the technical report to explain the conceptual basis and operationalization of the scoring of 
students’ visual code algorithms in ICILS 2018. 


The algorithm construction tasks required students to select commands (available as visual code 
blocks) and place them in sequence in order for the farm drone to complete specified tasks. The 
tasks included any or all of the following actions: having the drone move to specified locations ona 
grid, having the drone drop any of water, seed, or fertilizer, and applying conditional logic relating 
to the size of the crops on which an action may be conducted (using a simple but configurable “if... 
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do” command). 
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After two simple warm-up tasks, students were also introduced to and able to 


make use of a configurable “repeat” command used to complete multiple iterations of commands. 


The algorithm debugging tasks required students edit an existing configuration of commands 
in order for the farm drone to complete specified tasks. The algorithm debugging tasks were 


Z 


developed such that one or two minor modifications to the command configuration could result in 


the drone comp 


leting the actions as required. However, there were no restrictions in the debugging 


tasks on the nature of the changes students could make to the commands and their configuration. 
Students could, if they wished, remove all the existing commands and develop a completely new 


set of code. 


The tasks beca 


me progressively more complex as students worked through the module. Task 


complexity related to the number of actions needed to be completed by the drone and the number, 
type, and configuration of targets onto which the drone needed to drop any of water, seed, or 


fertilizer. 


The quality of students’ coding solutions to both algorithm construction and debugging tasks was 


conceptualized 
givensolutiona 


criterion, it was 


ultimately gene 


for each student 
of coding were determined by the research team, the individual scores for student responses were 


in terms of two main criteria. The first criterion related to the correctness of any 
nd the second was the efficiency (or elegance) of the solution. Ultimately we derived 


measures for each criterion and then combined these into a single measure of the quality of the 
solution to each coding task. As we defined, specified, and operationalized the measures for each 


possible to have the computer-based delivery system generate the relevant data 
response. So while the identification of variables and scores to measure the quality 


rated through the test delivery system. The method for scoring each criterion and 


then combining the scores to form a single score for each task is described below. 


Scoring the correctness of coding solutions 


these tasks, the 


target with c 


comprised two variables (correct targets and incorrect targets): 


The correctness of a solution was characterized by the degree to which the behavior of the drone 
in response to a student’s coding solution matched the required behavior of the drone specified 
in the task instructions. In a small number of simple tasks, the instruction was that the drone fly 
to aspecified location. For these tasks, the correctness was measured only in terms of whether or 
not the drone ended up at the specified location. For the majority of tasks the drone was required 
to both fly to specified locations and drop any of water, seed, or fertilizer on these targets. For 


correctness of the solution was initially scored according to three sub-criteria that 


i. the number of targets with the correct materials on them (correct targets: 2 score points per 


orrect materials) 


ii. the number of targets with incorrect materials on them (correct targets: 1 score point per target 
with incorrect materials) 


iii. whether any materials were dropped on a square that was not a target (incorrect targets: 1 


score point for no materials on any incorrect targets, O score points for any materials on any 


incorrect tar 


get). 


Each response was first scored for each variable (correct targets and incorrect targets). Following 
this, the frequencies of the scores were examined together with consideration of the descriptions of 
combinations of these scores. This was an iterative process used to establish a scoring logic for the 


correctness on 


each task. This scoring logic included combining the correct targets and incorrect 


targets scores into asingle initial combined correctness score for each task. Once the scoring logic 
for the correctness of each task was established, we completed a Rasch item response theory 


scaling analysis 


(Rasch 1960) (see Chapter 11 for further detail) to establish the measurement 


viability of the scoring logic and, where appropriate, we recoded scoring categories for selected 
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items. Table 2.13 shows an example operationalization of this scoring logic for a task, together 
with the recoding of categories following initial scaling. For each task, the score for the completely 
correct response remained as a unique and highest score category. The recoding combination 
of score categories for partially correct responses varied across tasks and was informed by the 
response frequencies, the substantive differences in response quality, and the degree to which 
these differences were reflected in the student data. 


Table 2.13: Example scoring of the correctness of student coding responses 
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Response description Targets Incorrect Initial Recoded 

(combining targets and incorrect targets) score targets combined combined 
score score score 

All 4 targets reached with only correct materials 8 1 > 4 2 

and no incorrect targets 

All 4 targets reached with only correct materials 8 O = 3 Al 

and one or more incorrect targets 

At least 2 targets with only the correct materials OR 1 2 1 

All 4 targets with the incorrect material and 4,5, 6, or 7 > 

no incorrect targets 

At least 2 targets with only the correct materials OR 4,5, 6, or 7 0 1 O 

All 4 targets with the incorrect material and one or => 

more incorrect targets 

Fewer than 2 targets met Fewer than 4 Oor1 > O O 


Scoring the efficiency of coding solutions 


The second cr 


terion used to score the s 
elegance) of students’ responses. We explored two main approaches when consideri 


each task in the farm 
in their soluti 
most common 
using one 


latter solution which 


also possible, 
A repeats) wi 
For each tas 
corresponded to dif 


it was 


requirements, the like 


The second approach 
their code solutions, such 
the elegance of the subs 


phase of the analysis we in 


commands were used. 


is the 


possi 


efficiency of the code in students’ respo 
commands in each response. For each tas 
included the smallest number of commands 
drone module studen 
on). For example, if a task required the drone to move forward five times, th 


ly achieved by students using either five “move forward” commands i 


“repeat” command (conf 


gured to 5 repeat 


was to examine the degree to which 
as loops within loops. This was considered for its potenti 
tance of a coding solution. For the more complex tasks 
to observe some variation in the degree to which students embedded commands 
uded an elegance variable derived from the degree to w 


Cc 


Once we had esta 


blished 


most effi 
such as to use one “move forward” command a 
th an additional “move forward” command. This solution would use three commands. 
fy clusters of the number of commands that most 
For each task we created an efficiency 
the solution and informed by the 
dbe associated wi 
ber of commands across the respon 


ble to identi 
ferent approaches 
score hierarchy based on the number of 
y characteristics of soluti 
of commands, and the frequencies of the num 


to solving the task. 
commands used in 
ons that cou 


nda “repeat” com 


th dif 


nses. The first was a simple count of the number 
the most efficient response was defined as 0 
needed to complete a completely correct response (for 
ts were instructed to use as few commands as poss 


s) and one “move forward” command. 
cient, uses two commands. In theory other configurations are 
mand 


bedd 


students used em 


tudent coding responses related to the efficiency ( 
ng the 


nN sequence 


configured 


SES: 


ed logic w 


and in anea 


efficiency scores for responses to each task and elegance scores 
responses to the more complex tasks, we included these data in the scaling model (see Chapt 


hich embedd 


or 


of 


ne that 


ible 
is Was 


or 
he 


to 


ikely 


task 
ferent numbers 


ithin 
al to represent 
it was possib 


e 


rly 
ed 


for 
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11). We found that the data from the elegance variable did not contribute to the quality of the 
measurement of student CT and at this point decided to use only the efficiency data to contribute 
to the scoring. Table 2.14 shows an example operationalization of the scoring logic in establishing 
the efficiency scores for a task, together with the recoding of categories following initial scaling. 


Table 2.14: Example scoring of the efficiency of student coding responses 


Comment on efficiency Commands Responses (%)* Code 
Minimum necessary commands 6 28 > 3 
Repeat with additional commands 7-10 16 > 2 
14 commands was the minimum necessary if not using the repeat 11-15 41 > 1 
More than 15 commands showed a good deal of redundancy >15 6 > 0 


Note: * These percentages include all responses using the given number of commands regardless of their correctness. 


Combining the correctness and efficiency scores to establish a single score for each coding 
task 

The final step of establishing scores for each coding task was to combine the correctness and 
efficiency scores. This was done by using the efficiency score for each task to adjust the correctness 
score. However, we decided that the efficiency of a solution should only be used to adjust the 
highest solution score for each task. Operationally this meant that the efficiency of code was not 
considered to be important in evaluating the quality of incorrect or partially correct responses, 
but that it was considered to be important in identifying differences in the quality of fully correct 
responses. 


We created anew composite variable (correctness and efficiency) for each coding task. For this 
variable, the score for fully correct responses was adjusted such that: 

e Fully correct responses with the highest efficiency score were adjusted to one score point 
higher than the highest correctness score; 


e Fully correct responses with a partial credit (moderate) efficiency score were not adjusted; 
and 


e Fully correct responses with a zero efficiency score were adjusted to one score point lower 
than the highest correctness score. 


The correctness scores for all partially correct or incorrect responses were unchanged when 
included inthe correctness and efficiency composite variable. Table 2.15 shows an example of how 
correctness and efficiency scores were combined to form the single composite variable. 


Released CIL test module and CT tasks 


One CIL test module, band competition, has been released since publication of the ICILS 
international report (Fraillon et al. 2020). This module required students to work on asequence of 
tasks associated with planning a website to promote a band competition within a school. The large 
task for this module required students to add content to a page on the band competition website. 


The large task in this module presented students with a description of the task details as well as 
information about how the task would be assessed. The description was followed by a short video 
designed to familiarize students with the task and highlight the main features of the software they 
would need to use to complete the task. A detailed description of the module appears on pages 60 
to 74 of the ICILS 2018 international report (Fraillon et al. 2020). 
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Table 2.15: Example scoring of correctness and efficiency combined 


Correctness | Efficiency Conceptual description of correctness and efficiency Recoded 
score score combined 
score (CE) 
2 3 All 4 targets using only the correct material with no 3 


irrelevant targets using no more than 6 commands 


2 21 All 4 targets using only the correct material with no 2 
irrelevant targets using between 7 and 15 commands 


2 O All 4 targets in one row reached using only the correct 1 
material with no irrelevant targets using more than 
15 commands 
1 N/A At least 2 targets with only the correct materials and 1 
no irrelevant targets using any number of commands 
OR 
All 4 targets with the incorrect material and no irrelevant 
targets using any number of commands 1 
0 N/A Fewer than 2 targets met O 


Four tasks from the CT test have also been released. Two tasks were taken from the automated 
bus module and show the use of a simulation configuration of a decision tree. Two tasks were 
taken from the farm drone module and show algorithm construction and algorithm debugging. 
Detailed descriptions of these tasks appear on pages 974 to 101 of the ICILS international report 


(Frai 


llon et al. 2020). 
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CHAPTER 3: 


Computer-based assessment systems 


Julian Fraillon, Ralph Carstens, and Sebastian Meyer 


Introduction 


ICILS 2018 collects student achievement and questionnaire data on compu 
paper. This chapter describes the key aspects of the computer-based test delivery 
in ICILS 2018. It also details some of the challenges of using computer-based syste 
nacrossnati 


student data i 


The focus of the chapter is on the overall approach, architecture, and design of t 
suite as well as the relationship of these to other technical systems 


assessment (CBA) system 


used for the s 


urvey opera 


assessment and test desig 
in Chapter 8, data flow and 


in Chapter 11 


ter ra 


onal large-scale assessment such as ICILS 2018. 


ne com 


tions. Details and procedural aspects are provided 
nin Chapter 2, translation and adaptation in Chapter 5, fiel 


ICILS 2018 computer-based components and architecture 


Pre-existing components 


ther than on 
system used 
ms to collect 


puter-based 


in other chapters: 


doperations 


integration in Chapter 10, and scaling and analysis of the test materials 


EA has been using computer-based systems for anumber of years in order to organize and support 


countries to coordinate field operations and collect questionnaire data from teachers and schools. 


These system 


[he IEA Da 
questionna 


occupation 


a 


All 
cus 


five so 


tomization 


Data from 
integrated 
informatio 


in 
no 


As in ICILS 2013, RM Results (previously SONET Systems) provided the software compon 
tion of student data via computer for ICILS 2018. These software 
der suite known as AssessmentMaster (AM). The software modules 
ivery of the ICILS 2018 student test and questionnaire include an 


directly relati 
components a 
specifically re 


s, developed 


The IEA Windows Within-Sch 
manage within-sc 
The IEA Online S 
subsequent delivery of computer-based questionnaire material; 
agement Expert (IEA DM 


rvey 


ta Man 
irema 


terial 


-and 


The IEA Translation System, which is used to translate and review the school, ICT 
nd teacher questionnaires online. 


ftware components were used in ICILS 2018 once they had undergo 


s and/or ext 


paper- and computer-based test and ques 
the pre-existing IEA 
n this process. 


ng to the col 
repartofab 
ating to the 


administration module (A 


module (AM 


Examiner), anda 


Three ICILS 2018 student 


referred to as 


trend modu 


hool sampling and test administra 


by IEA, include the following: 
EA WinW<3S), which supports 


tion procedures; 


ool Sampling Software (I 


System (IEA OSS), which supports the translation, ada 


E), which is used to capture data from 
and to integrate and verify national databases; 


n 
ension so that they would suit the specific study context. 


tionnaire components were tran 
Expert software. Chapter 10 pr 


Data Processing 


ec 
roa 
de 

anager), a translation module (AM Designer), a del 
scoring module (AM Marker). 


CeS 
eS; 


t modules were already administered in ICILS 2013. They are us 


countries to 


ptation, and 


paper-based 


The [EA Coding Expert, which is used to code the student responses with respect to parental 


coordinator, 


e significant 


sformed and 
ovides more 


ents 


ivery engine 


ually 
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Components developed for ICILS 2018 
Apart from general updates and improvements in all software packages used in ICILS 2018, four 
new test modules were introduced. Two modules for the computer and information literacy (CIL) 
test and the two new modules comprising the computational thinking (CT) test. 


Procedures an 


A series of detailed man 


manuals are kn 
with the entire 


d software user manuals 


SOP materials. 


uals guides the work of national center s 
own as Survey Operations Procedures (SOP) manuals. | 
international team, has developed and refined instr 
WinWSS, IEA OSS, IEA DME, and the IEA Translation System and 
Each user manual is tailored to the needs of each proj 


During ICILS 2018, a separate user manual was developed for 
instructions on how to use the scoring and administration modules were included within the SOP 


materials. The manuals relating to 
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taff during IEA studies. These 
EA, in close cooperation 
uctions and guidelines for IEA 
included these in the suite of 
ect (see Chapter 8 for details). 


the translation module, while 


the delivery engine module were incorporated in the manuals 


for school coordinators and test administrators who were administering the tests in schools. 


ICILS 2018 system architecture 


Table 3.1 provides an overview of t 


he ICILS 2018 computer-based system and its supporting and 


accompanying systems. The table also shows how these systems interfaced and interacted. The 


table is organiz 


ed by the major com 


ponents of the system and the ICILS 2018 target populations. 


Ascanbe seen, ICILS 2018 used a mixture of pre-existing and newly developed and/or customized 
components and tools to support the preparation, administration, and post-processing of the study. 


Developing the computer-based delivery platform for ICILS 2018 


Developing the delivery engine 


Development of the computer-based delivery platform took place in paralle 
(see Chapter 2 for details of the test development). The three trend test 


administered i 
some changes i 
functioning of t 
in 2018 incom 


n ICILS 2013, were already availab 
nthe delivery system, AM Examiner, 


parsion to 2013. Two new CIL modu 


ICILS 2018. Once the new test modules’ storyboard 


Results for aut 
necessary func 
Each task unde 


e and largely 


horing into the delivery platform. Th 


rwent many such iterations. 


tionality of each task inthe test was reviewed and ret 


ready- 


| modu 


with test development 
modules, i.e., the ones 
to-use. In response to 


RM Results updated aspects of the backend 
he trend modules. These changes did not affect the user-experience of the modules 
es and two Cl 
s had been completed, they were sent to RM 
iS process was an iterative one wherein the 
fined once it had been enacted. 


es were developed for 


The test delivery engine was web-based, which meant that technically the tests could be delivered 


over the internet. However, the ICI 


tests should be 


Windows-base 


8) 


nly through Fi 


LS 2018 research team decided, for operational reasons, that the 


administered on USB drives (one per computer), each containing a self-contained 
web-server to deliver the web-based test content. Although the USB-based delivery engine was 


d, it could be run using some forms of Windows emulation 


The USB hosted a por 
materials were developed, rendered, and tested using Mozilla Firefox an 
research team recommended that the translation, scoring, and administration modules be accessed 
refox (although Google Chrome was also an option). As pa 
procedures, national centers needed to ensure that session data from each USB drive were 
uploaded to a central database as soon as practicable after each test sessi 


table version of the Mozilla Firefox browser. Accor 
d Google Chrome. The 


on Mac OS X or Linux. 


ingly, all ICILS 2018 


OQ: 


rt of the field-operation 


on. 


COMPUTER-BASED ASSESSMENT SYSTEMS 


Table 3.1: ICILS computer-based or computer-supported systems and operations 
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Process 


Student test and 
questionnaire 


Principal, ICT coordinator, 
and teacher 
questionnaires (online) 


Principal, ICT coordinator, 
and teacher 
questionnaires (paper) 


Authoring/master 


AssessmentMaster 


EA OSS admin module, 


Microsoft Word, direct 


instruments authoring from storyboards | transfer from IEA authoring of paper-based 
Translation System questionnaires 
Translation AM Designer, web-based EA Translation System Microsoft Word, desktop- 


based 


Sample selection and 
D provisioning 


EA WinW3S 


Delivery systems 


AM Examiner 


EA OSS delivery module, 
web-based 


Personalized questionnaire 
prints 


nitialization 


Student ID, password, and 
anguage information from 
EA WinW23S 


Respondent ID and 
password from IEA 
WinW3S 


Labels with respondent ID 
rom IEA WinW3S 


Primary delivery mode 


USB sticks: Windows-based 
on local school computers; 
proctored 


Any internet browser: 
self-administered 


Paper: self-administered 


Alternative delivery mode(s) 


Laptop server mode or 
carry-in laptop sets (where 
school infrastructure 
insufficient) 


Paper questionnaires 
(where infrastructure 
insufficient) 


/A 


Data capture 


Directly onto USB sticks; 
alternatively, local laptop 
servers (student by student) 


Directly into central 
database (all respondents) 


Manual (human) data 
using IEA DME software 
after administration (all 


respondents) 
Data merging Upload to AM Manager N/A Merging from multiple IEA 
(across respondents) DME databases (where used) 
Data scoring AM Marker N/A N/A 


Data management 


IEA WinW3S; crosscheck of expected versus available data 


An alternative delivery option was made available for use in the main survey. Under this system, 
the delivery engine was installed on anotebook computer connected to a school LAN. The school 
could then access the test locally over the school LAN via Mozilla Firefox or Google Chrome. 


School coordinators were provided with the following list of minimum specifications for computers 
to run the ICILS student test from USB sticks. 


e Screen resolution of at least 1024x768; 


e Oneof the following operating systems: Windows XP/Service Pack 3/Windows Vista/ Windows 
7/\Windows 8/Windows 10; 


e Display (fonts) set to the optimal size (minimum: 100%; the system font size can be set from 
the computer's Control Panel/Appearance and Personalization/Display); and 


e AUSB Port 2.0 or higher. 


School coordinators were provided with the following list of minimum specifications for computers 
to run the ICILS student test using the local server method (LAN). 
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Requirements for the server computer: 


Screen resolution of at least 1024x768; 


One of the following operating systems: Windows 7/Windows 8/Windows 10* (*Windows 10 


was recommended for the bes 


Arecommended CPU of 15 


t experience); 


O00 MHZ; 


An installed System Memory of at 


least 4 GB; 


At least 10 GB of free storage space on the hard drive; 


LAN or Wi-Fi connectivity; and 


Internet connectivity (only req 


uired for data upload). 


Requirements for the client computers: 


Screen resolution of at least 1024x768; 


Connectivity to the same LAN or Wi-Fi network as the server computer; 


Recent version (less than 12 months old) of the Mozilla Firefox or Google Chrome browser 


installed; and 


the computer's Control Pan 


The delivery module was deve 


Delivery of multiple choice, 


responsiveness); 


Delivery of linear and nonli 


Delivery of large/authoring 


between and within pages; 


Capture of all student final respon 
Capture of time taken on each task; 


Facility to score student response 


A progress display (number 


tasks ( 


format text in a range of formats); 


Display of closed web environme 


of tas 


Display (fonts) set to the optimal size (minimum: 100%; the system font size can be set from 
el/Appearance and Personalization/Display). 


oped to include the following features: 
constructed response, and drag and drop items; 


near skills tasks (based on software simulations with real-world 


ive software applications with functionality to add, edit, and 


nts with the facility to display multiple pages and navigate 


ses to each task; 


s to skills tasks; 


s completed and to come per module) for students; and 


A countdown timer for students per module. 


Developing the translation systems 


In ICILS 2018 two different translation systems were used: the IEA Translation System and AM 
Designer. This corresponds to the fact that two delivery systems for the electronic administrations 
of ICILS 2018 instruments were also used. The school, teacher, and ICT coordinator questionnaires 
were administered via the IEA OSS (if not administered on paper) while the student tests and 


questionn 


Both tran 
needed th 


Development of t 


aires were administered using AM Examiner. 


slation systems are web-based applications accessed through a web-browser. Users 
e following in order to access it: 


A computer with a broadband or equivalent high-speed internet connection; and 


Aweb browser, e.g., Mozilla Firefox or Google Chrome. 


he translation systems provided the following features: 


Some selective functionality relating to the role of the user (e.g., administrator, translator, 
translation reviewer, translation verifier, verification reviewer); 
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e Opportunity to enter translated text for all screen elements; 


e Ability for those countries where the test would be administered in more than one language to 


select the target language; 


e Ability to view the translation history of a text element; 


e Ability to view the source text, the translated text, and any previous revisions of the translated 


text, and to compare translated versions; 


e Acomments interface to allow translators, reviewers, and verifier 
to elements of text; 


s to enter comments linked 


e Opportunity to view “live” translated versions of the tasks at any point during the translation 


and to simultaneously view (in a separate window) a live version of the so 


e Ability to enter text as plain text, including editable HTML tags or to ente 


tools to format text; 


e Ability to view translated text elements as plain text or as rendered HTML; 


e Ability to search for and/or selectively bulk-replace text within and across tasks; 


urce task; 


r and use formatting 


e Ability to bulk-import the field trial translations so that they could be used as a starting point 


for main survey translations; and 


e¢ Opportunity for translators, reviewers, and verifiers to monitor the progress of task completion 


i.e., translation state). 


Both systems included the functionalities listed above. The school, teacher, and ICT coordinator 
questionnaires were translated and verified within the IEA Translation System while the student 


instruments (test and questionnaires) were translated and verified w 


Developing the scoring module 
The ICILS 2018 scoring module, AM Marker, was an adaptation of the 


did not need to be translated. (ICILS 2018 did not require scorers to 


the application, users needed: 


be proficient in 


e Acomputer with a broadband or equivalent high-speed internet connection; and 


e Arecent version (12 months old or less) of Mozilla Firefox or Google Chrome. 


AM Marker was developed to include the following features: 


ithin AM Designer. 


CILS 2013 scoring module. 
This adaptation included the development of some new features for use in ICILS 2018. It also 
included replacing (where possible) text in the user interface with icons so that the user interface 


English.) 


The web-based scoring application could be accessed through a web-browser. In order to access 


e Selective functionality relating to the role of the user (e.g., administrator, scoring trainer, team 


leader, scorer); 


e Facility to specify a proportion (in ICILS 2018, 20%) of student responses to be blind double- 


scored for the purpose of monitoring inter-rater reliability; and 


e Capacity to begin scoring before all student responses had been uploaded to the system (i.e., 
before completion of data collection in schools) without compromising the double-scoring 


procedure. 


Scorers were able to: 


e View tasks (as they appeared to students in the test) along with student responses on screen; 


e Enter ascore for each student for each task; 


e Flag pieces of work for follow-up with a more senior staff member 


, 


e Navigate back to previously scored pieces of work and amend scores; 
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e View large tasks with ful 
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functionality; and 


e Enter a “training” mode to score pre-scored work and receive feedback on scoring accuracy. 


In addition, team leaders could: 


e Review (check-score and 


amend) scorers’ scores; and 


e Monitor and respond to flagged pieces of work. 


In addition, scoring trainers could: 


e Select, order, and annotate responses for use in scorer training. 


In addition, scoring administrators could: 


e Allocate scorers to teams; and 


e View reports of interscorer reliability. 


Developing the administration module 


The ICILS 2018 administrati 
staff monitor the progress 


on module, AM Manager, was developed to (i) support national center 
of test sessions (i.e., data upload), create user accounts, and define 


roles for users of the other modules (scoring and translation), and (ii) export test-session data for 


importing into IEA WinW3sS. 


The administration module 


was a web-based application that users accessed through a web- 


browser. In order to access the application, users needed: 


e Acomputer with a broad 


band or equivalent high-speed internet connection; and 


e Arecent version (12 months old or less) of Mozilla Firefox or Google Chrome. 


Development of the administration module focused on ensuring that users could: 


e Create user accounts and allocate roles; 


e Download the software i 


e View national test sessio 


mage of the ICILS 2018 test (including the test delivery engine); 


n details; 


e Monitor national test session status; and 
e Export test session data for IEA WinWSS. 


Challenges with com 


Successful collection of data 
being able to provide respon 
can be recorded, organized, 
data, it is imperative that all 


Uniformity of test presenta 
present additional challeng 


puter-based delivery in ICILS 2018 


in large-scale computer-based surveys relies on individual participants 
ses on computers in controlled uniform environments from which data 
and stored for later use. When the data being collected is assessment 
respondents experience the tasks in an identical manner. 
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Table 3.2: Unweighted participation status percentages across participants based on full database and 
original coding by test administrators (reference: all sampled students) 


Participation status Test Questionnaire 
Left school permanently 1.3% 

Parental permission denied 2.6% 4.2% 
Absen 6.4% 7.0% 
ncompatible or failed equipment before assessment 0.1% 

Technical failure during assessment 0.3% 0.1% 

USB stick lost or upload failed after assessment 0.1% 
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Total 100% 


Table 3.2 shows that test and questionnaire data were collected from 89.2 percent and 87.2 
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nt of sampled students respectively. The slightly lower percentage of data collected for the 
onnaire is a result of two factors: 
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0.6% more students were absent for the questionnaire than the test). 


n some countries parents gave permission for their children to complete the test but not 
the questionnaire (resulting in a further absentee difference of 1.6%). 
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use of ICT at home), and their individual characteristics. Factors related to each of these different 
levels shape the way students respond to learning about computers and computing. 


Contextual influences on CIL and CT learning are conceived as either antecedents or processes 
(Figure 4.1). Antecedents refer to the general background that affects how CIL and CT learning 
takes place (e.g., through context factors such as ICT provision and curricular policies that shape 
how learning about ICT is provided). Process-related variables are those factors shaping CIL and 
CT learning more directly (e.g., the extent of opportunities for CIL and CT learning during class, 
teacher attitudes toward ICT for study tasks, and students’ computer use at home). 


Figure 4.1: Contexts for ICILS 2018 CIL/CT learning outcomes 


Antecedents Processes Outcomes 


School/classroom 
Characteristics 
Stated ICT curriculum 
ICT resources 


School/classroom 
ICT use for teaching/ 
learning 

CIL/ICT instruction 


eoeec Sete SeesSseeese- iE) cee secs e se Sesess 


Student 
Learning process 


Student 
Characteristics 


In reference to this conceptual framework structure, variables collected through contextual 
instruments with examples of different types of measures are displayed in Table 4.1 below, where 
columns contain antecedents and processes and rows the four levels. The student questionnaire 
collected data on student experience, use, and perceptions of ICT as well as contextual factors at the 
individual (either school or home) level. The teacher, principal, and ICT coordinator questionnaires 
focused on gathering data to be used at the school level while the national contexts survey and 
published sources provided variables at the system or national level. 
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Table 4.1: Mapping of variables to contextual framework with examples 


Level of ... Antecedents Processes 

Wider community NCS & other sources: NCS & other sources: 
Structure of education Role of ICT in curriculum 
Accessibility of |CT 

School/classroom PrQ, ICQ, & TQ: PrQ, ICQ, TQ, & StQ: 
School characteristics CT use in teaching and learning 
ICT resources CIL/CT instruction 

Student StQ: StQ: 
Gender CT activities 
Age Use of ICT 

CIL/CT 

Home environment StQ: StQ: 
Parent socioeconomic status Learning about ICT at home 
Home ICT resources 


Notes: NCS = national contexts survey; PrQ = principal questionnaire; |CQ = ICT coordinator questionnaire; 
TQ = teacher questionnaire; StQ = student questionnaire. 
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e About you: This section included questions about the student’s age, gender, and expected 
education. 


e Your home and your family: These questions focused on characteristics of the students’ homes 
(including ICT resources) and their parents’ occupations and educational backgrounds. 
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Development of the teacher questionnaire 
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The teacher questionnaire was designed to collect contextual information about school and 
classroom contexts for ICT learning, use of ICT for teaching and learning, teacher views on the 
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e Your use of ICT: These questions focused on teachers’ experience of ICT, their frequency of use 

of ICT, and their confidence in performing ICT tasks. 
e Your use of ICT in teaching: These questions asked teachers to name a reference class, provide 
information about the subject taught in that class, and state whether they used ICT for teaching 
and learning activities in this class. Those teachers who said they used ICT were asked to 
indicate the emphasis given to the development of Cll and CT-related capabilities and their 
use of ICT for various class activities and teaching practices. They were also asked to indicate 
the frequency with which they used various ICT tools in their teaching of this class. 


e In your school: The questions in this section asked the teachers about their views on using ICT 
in teaching and learning. The questions also asked teachers about provision for and practices 
concerning the use of ICT in their school. 

e Learning to use ICT in teaching: This section asked teachers about whether their initial teacher 
education included learning to use ICT and whether they had participated in |CT-related 
professional learning. 

e Approaches to teaching: This asked teachers to indicate the extent to which they agreed with a 

series of statements about using ICT in teaching and learning at school. 


Development of the school principal and ICT coordinator 
questionnaires 

The school questionnaires were designed to collect information about the school context in 
general and the use of ICT in teaching and learning in particular. Two questionnaires were used to 
collect this information. The first was directed to the school principal and the other to the school 
ICT coordinator. Both questionnaires were delivered online by default, but an alternative paper- 
based version was available in cases where respondents were unable or unwilling to complete it 
ona computer. 


Factors relating to the school context included school characteristics, such as school size, 
management, and resources, the availability of |CT resources, professional development regarding 
ICT use for teachers, and expectations for ICT use and learning. 


The questionnaire for school principals was designed to be completed in 15 minutes. The questions 
addressed school characteristics as well as school principals’ perceptions of ICT use for teaching 
and learning at their schools. 


The ICT coordinator questionnaire was also designed to be answered in 10 minutes. It included 
predominantly objective questions about the respective schools’ ICT resources and their processes 
and policies with regard to this area. 


The school principal questionnaire for the field trial included 18 questions with a total of 92 items, 
and was administered to 340 principals from the 14 countries that participated in the field trial. In 
most countries, about 24 school principals provided responses to the field trial questionnaire. The 
ICT coordinator questionnaire for the field trial consisted of 17 questions with a total of 91 items 
and was completed by 327 ICT coordinators at participating schools in 14 countries. 


The analyses of field trial data focused on providing empirical evidence that would assist selection 
of the main survey material. However, the relatively small number of responses in each of the 
participating countries (the maximum was only one per school) meant that analyses of the field 
trial data gathered by the two questionnaires were limited in scope. 


The ISC research team discussed the results of the school questionnaire field trial with NRCs 
before selecting the items that would be included in the final main survey instrument. Revisions 
made after the field trial included a rewording of some of the items. 
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The review of the field trial outcomes led to a reduction in the size of the school principal 


questionnaire, which consisted of 15 questions with a total of 94 items spread across the following 
four sections: 


e About you and your use of ICT: This section asked school principals about their gender and ICT 
use. 


e Your school: This section contained questions about school size, grades taught at the school, 
community size, and school management. 


e /CTand teaching in your school: This section consisted of questions about the importance assigned 
to ICT use at school, monitoring of ICT use by teachers, and expectations about teacher use of 


e Management of ICT in your school: This section contained questions about ICT management, 
|CT-related procedures, |CT-related professional development for teachers, and priorities 
for ICT use in teaching and learning. 


The final ICT coordinator questionnaire comprised 15 questions (including two optional questions) 
with a total of 87 items (including 11 optional items). It contained the following three sections: 


e About your position: This section asked ICT coordinators about their position at school and their 
school’s experience with computers for teaching and learning. 


e Resources for ICT: This second section included questions on the ICT equipment available at 
school. 


e /CT support: This section consisted of questions on the support provided for ICT use at school 
and/or the extent to which a lack of resources was hindering that use. 


Development and implementation of the national contexts survey 


The ways in which students develop ClL and CT are potentially influenced by factors located at the 
country or national context level. These variables include, among others, the education system in 
general as well as policies on, andthe curricular background of, CILand CT education. The national 
contexts survey was designed to collect relevant data and information about both antecedents and 
processes at the country level. The experience of studies such as the Second Information Technology 
in Education Study (SITES) 2006 (Plomp et al. 2009), the US Department of Education (2011) 
study of educational technology, and ICILS 2018 (Fraillon et al. 2014) informed the development 
of the national contexts survey. 


ICILS staff at the ISC at ACER organized the development and coordination of the national 
contexts survey as well as the analyses, verification, and reporting of the data collected by this 
instrument. Throughout this work, the ISC staff worked closely with national center staff from 
the participating countries. 


The development and implementation work consisted of three phases: 


e Phase 1: During this first phase, which spanned May to August 2017, the ISC team, in discussion 
with the national centers, reached agreement on the nature and scope of the survey’s contexts 
and questions. During this phase, international project team members and national center staff 
discussed the various draft versions of the survey and reached agreement on a final version. 


e Phase 2: Between March 2018 and January 2019, the NRCs answered the national contexts 
survey. 


e Phase 3: The final phase took place between October 2018 and July 2019. During this phase, 
ISC staff reviewed the collected information and, where necessary, verified the outcomes with 
national centers. 
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During the development phase of the national contexts survey, the research team applied the 
following criteria when considering which contexts and questions to include in it: 


Relevance of content with regard to the ICILS 2018 assessment framework; 


Relevance and additiona 
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Figure 5.1: ICILS 2018 instrument preparation workflow 
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While NRCs could decide where they consider adaptations as crucial, in some cases they were 
required to adapt certain terminology. The international version of the study instruments indicated 
where such required adaptations were necessary. For example, required adaptations were the 
[target grade], personal names (e.g., [Male name 1]), places (e.g., [M-town]), or International 
Standard Classification of Education (ISCED) level equivalents. Fictional technical and software- 
related terminology also required adaptation in some cases. For instance, the term [WebSearch] 
referred to an online search engine and needed to be adapted to a term that makes sense to 
students. 


Inthe field trial (where adaptations were requested directly into the translation system), structural 
and non-structural adaptation took place as two separate stages. However for the main survey, 
RCs had to record all structural and non-structural adaptations for their study instruments at 
the same time ina document called the national adaptation form (NAF). During the adaptation and 
translation process, NRCs had to document the national version of the adapted text in the NAF 
and include an English back translation and an explanation or comment if necessary. 


While national adaptations are important to reflect the country context, maintaining the 
comparability to the international source version is highly important. As such, the meaning and 
difficulty level must be preserved. Therefore, cognitive items may not be simplified or clarified ina 
way that would help the students identify the correct answer. In order to help countries adapting 
their materials to the national context, the ISC provided them with detailed notes that included 
definitions about specialized terminology in the ICT context as well as general guidelines for 
translation and adaptation. 


Translating the ICILS 2018 study instruments 

In order to create a high-quality translation, it is important to use professional and experienced 
translators and reviewers. For ICILS, IEA recommended using at least one translator and one 
translation reviewer for each target language, or more if only a limited amount of time was 
available. Ideally, the selected individuals should fulfill the following criteria: 


e Translators and reviewers should have an excellent knowledge of English. 


e Translators and reviewers should be native speakers of the target language. 


e Translators should be familiar with survey instruments. 


e Translators should have experience in translating electronic texts, such as websites. 


The ideal workflow suggested that the selected translator translates all materials initially. In the 
next step, the reviewer reviews the translations, makes comments on the appropriateness of the 
translations and proposes improvements. The NRC finalizes the materials before submitting them 
for translation verification. In countries with more than one administered language, the same logic 
applied to each language. In addition, countries using more than one administered language needed 
to perform equivalence checks between the two or more language versions. 


In countries where English was the administered language, the process of instrument preparation 
was identical to all other languages, except that the source version of the instruments did not 
require translation, but adaptation to accommodate the national English usage in their country. 


Guidelines for translation and adaptation 


Due to the complexity of languages and national contexts, it is difficult to provide participating 
countries and benchmarking participants with explicit guidelines for translation and adaptation. 
However, IEA provided NRCs with general guidelines and recommendations (included in the 
Survey Operation Procedures Unit 3), which are helpful to produce study instruments that are 
comparable to the international source. 
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Table 5.1: Languages used for the ICILS 2018 study instruments 


Participating country 


Administered language 


Chile Spanish 
Denmark Danish 
Finland Finnish 
Swedish 
France French 
Germany* German 
taly talian 
azakhstan azakh 
Russian 
orea, Republic of orean 
Luxembourg English 
French 
German 
Portugal Portuguese 
United States English 
Uruguay Spanish 
Benchmarking participants 
oscow (Russian Federation) Russian 
orth Rhine-Westphalia (Germany)* German 


Notes: *The same translations were used in Germany and North Rhine-Westphalia 


In addition to the student assessment modules, NRCs had to prepare the following context 


questionnaires for data collection: 
e Student questionnaire 
e Teacher questionnaire 
e Principal questionnaire 


e¢ ICT coordinator questionnaire 


National study centers also had to translate and adapt the school coordinator and test administrator 
manuals as well as the scoring guides for constructed-response items. However, the ISC and IEA 
did not monitor the translation and adaptation process for these documents and did not perform 


any of the verification processes. 


Translation systems 
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Inorder to prepare the ICILS 2018 study instrum 
national study centers with two online translati 
AssessmentMaster (AM) system from RM Resu 


ents for the main survey, t 


about the two systems). The online translation systems served as the main 
parties to produce high-quality national instruments and to communicate with one another. 
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ts, AM Designer (see Chapter 3 for more details 


platform for all involved 


pal, and teacher questionnaire 
ereas AM Designer was used for the student 
th systems allowed users 


to apply and edit their 
the text and enabled all 


users to retrace changes made by other users ensuring transparency of the translation process. 
In addition, the systems displayed the differences between a previously saved translation and 
the current translation in the form of track changes. This function supported NRCs with their 


reviewing process. 
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International verification processes 


After the international instruments were adapted to the national context, translated, and internally 
reviewed by the national centers, the national versions of the instruments were submitted for 
external verification, which consisted of a rigorous three-part verification process: (1) adaptation 


verification, (2) translation verification, and (3) layout verification. 


Adaptation verification 


better alignments to the international source version. 


RCs were asked to consult with ISC staff for a review of all proposed national adaptations. The ISC 
particularly emphasized the need for NRCs to discuss any adaptation that might result ina serious 
deviation from the international instruments. National centers began completing the NAF (Version 
) after reviewing the international version of the survey instruments. They then submitted the NAF 
for consultation with the ISC, where two staff members reviewed the adaptations independently. 
Once completed, these reviews were consolidated into one document, and were then provided 
to the national centers with feedback on their adaptations and, where necessary, suggestions for 


Common issues identified during review of adaptations included the following: 


e Inconsistent use of adaptations within or across modules and questionnaires; 


e Fictional names for software or technologies not considered to be equiva 
version; and 


lent to the source 


e Difficulties in establishing country-appropriate adaptations for ISCED levels. 


The ISC asked the national centers to take the recommendations into account and update the 
forms accordingly. The NAF was only finalized once the ISC and national center were in agreement 
about all adaptations. Once this was done, these updated forms (Version II) would then inform the 


translation verification process. 


Translation verification 


Translation verification is the second out of three verification steps conducted during instrument 


preparation. The goal of this process is to ensure high-quality translations of 
2018 instruments and their equivalence to the international source version. 


the national |ICILS 


IEA managed and 


coordinated this process in cooperation with cApStAn Linguistic Quality Control. 


International translation verifiers and their responsibilities 


The contracted, professional, international translation verifiers were native speakers of the 
administered language. Further, the verifiers were certified translators working in English, witha 
university degree and ideally living in the target country or at least having experience working in 
the country context. Verifiers needed to attend a training seminar, where they received information 
regarding the ICILS study design and online environment. They also received instructions regarding 


the verification specifications and requirements. 


The translation verifiers’ main responsibilities included reviewing the translati 


ons for the target 


country and evaluating the accuracy and comparability of the national version of the ICILS 


instruments to the international version. Their tasks further comprised docume 


nting all deviations 


inthe country’s translations and adaptations and suggesting alternatives to improve the quality of 


the translations. Verifiers checked if the meaning and reading level of the text 


had been affected 


and if the test items had been made easier or more difficult. They also ensured that no information 


was added or omitted. Translation verifiers made sure that the instruments co 


ntained all correct 


items and response options in the right order and that adaptations had been recorded and 


implemented correctly. 
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of trend measurement on the ICILS achievement scal 


stot 


ranslation the country submitted for translation verific 
ag any detected differences. 


+ 


In addition to the general verification of the translations, international verifiers had to perform 
a trend check for countries that participated in ICILS 2013. To ensure the quality and accuracy 
n 


the exact wording of the translation that countries used in the previous ICILS cycle. This was 
applicable to four countries that participated in ICILS 2018. IEA provided the verifiers with files 
hat contained the final translations from ICILS 2013. The verifiers’ task was to compare th 


e, it was of high importance to maintai 


e 
ation with the version of the last cycle and 


Translation verification procedure and verifiers’ feedback 


The translation verification process occurred in the tw 


o previously described online translation 


systems. International verifiers could apply corrections or suggestions directly to the text elements 


inthe online environment. Any changes made by the veri 
text element. In addition, verifiers were required to doc 


fler were displayed as track changes inthe 
ument their corrections and suggestions, 


describe the issue, and provide an English back translation in the form of comments for every 


affected text element and in the NAF. To simplify the 


verification process and to enhance the 


comprehensibility of the verifiers’ feedback, verifiers assigned so-called severity codes to every 
change or suggestion. They made use of the following codes to indicate the severity of the issues 


they found: 


e Code 1: Major change or error. The translation containe 


30 


eaning of a question, omitted or added information 


ifferences for trend translations. 


ode 1?: Used by verifiers whenever they were in dou 
f how to correct a possible error. 


e 
oO a0 


e Code 2: Minor change or error. The translation contai 
comprehension. Examples included spelling errors, g 
did not affect the comprehensibility of the question. 


e Code 3: Suggestion for alternatives. Used by verifiers 
otherwise appropriate translation. 


dasevere error that affected the meaning 


r difficulty of the item. Examples included mistranslations, translations that change the 


, incorrect order of questions or response 


ptions, and incorrectly implemented national adaptations. Verifiers also used this code to flag 


bt about the severity of an issue or unsure 


ned a minor error that did not affect the 
rammatical errors, and syntax errors that 


to suggest an alternative wording for an 


e Code 4: Acceptable change. The translation was deemed acceptable and appropriate. Also used 
by verifiers to indicate that a national adaptation had been documented and implemented 


correctly. 


After translation verification, IEA reviewed the verifiers’ feedback and returned the materials 
to the national centers. NRCs were responsible for finalizing their instruments and thus had to 


review the feedback carefully and consider the correcti 


ons and suggestions made by the verifier. 


The procedure required NRCs to react to every comment the verifier made in the IEA Translation 
System, AM Designer, and the NAF. In principle, NRCs could accept, modify, or reject the verifier’s 
suggestion. For the latter two options, NRCs had to give an explanation for why they changed or 


disagreed with the suggestion. 


Some of the typical errors found by the translation verifiers during the verification work included 
mistranslations, literal translations, inconsistencies of terms or phrases, omissions or additions of 
text, undocumented adaptations, grammar and punctuation issues, and spelling mistakes. Some of 
the domain-specific concepts such as |CT-related technical terminology were a particular challenge 
to translate into some languages. According to the survey activities questionnaire completed 
by NRCs and used to collect feedback on survey operations, almost all participants found the 


translation verification feedback very useful (8) or at least somewhat useful (3) 


[he majority of 


the national centers (10) reported that they did not experience any major problems during the 
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Before the national questionnaires were published online, the layout and structure of all online 
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CHAPTER 6: 


Sampling design and implementation 


Sabine Tieck 


Introduction 


This chapter provides an introduction to the sampling design and the implementati 
2018 student, teacher, and school survey which are consistent with those used 
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EA Windows 
upported NRCs in their sampling activities. 


The sampling referee Marc Joncas gave advice on sampling methodology, as well as reviewed and 


adjudicated all national samples for this study. 


Target population definitions 


When cond 


ucting a cross-country comparative survey, it is important to clearly define the target 


population(s) under study. ICILS collected information from students, their teachers, and their 


schools, wh 


ich required clear definitions for all three populations. The definitions enabled ICILS 


NRCs tocorrectly identify and list the targeted schools, students, and teachers from which samples 


were to be selected. 


Definition: Students 
ICILS defined the target population of students as follows: 


The student target population in ICILS consists of all students enroled in the grade that represents eight 
years of schooling, counting from the first year of ISCED Level 1,1 providing the mean age at the time of 


testing is at least 13.5 years. 


the interna 
teachers) a 
accordingly in the international report. To ensure international co 
to specify their country’s legal school entry age, the name of the 
the mean age of the students in that grade. 


tional population definition but decided to survey th 


Hereafter, the term “students” is used to describe “students in th 
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1 ISCED stands for International Standard Classification of Education (UNESCO 1997). 
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Definition: Teachers 
ICILS 2018 defined the target population of teachers as follows: 
Teachers are defined as school staff members who provide student instruction through the delivery of 


lessons to students. Teachers may work with students as a whole class in a classroom, in small groups in 
resource rooms, or one-to-one inside or outside of classrooms. 


The teacher target population in ICILS consists of all teachers that fulfill the following conditions: They are 
teaching regular schoo! subjects to students of the target grade (regardless of the subject or the number 
of hours taught) during the ICILS testing period and since the beginning of the school year.? 


School staff from the following categories were not regarded as part of the target teacher 
population (i.e., were out of scope): 


e Anyschool staff that attend to the needs of target-grade students but do not teach any lessons 
(e.g., psychological counselors, chaplains); 


e Assistant teachers and parent-helpers; 


e Non-staff teachers who teach (non-compulsory) subjects that are not part of the curriculum 
(e.g., cases where religion is not a regular subject and taught by external persons); and 


e Teachers who have joined a school after the official start of the school year. 


Hereafter the term “teachers” is used to describe “teachers in the ICILS target population.’ 


Definition: Schools 


In ICILS 2018 schools were defined as follows: 


A school is one whole unit with a defined number of teachers and students, which can include different 
programs or tracks. The definition of “school” should be based on the environment that is shared by 
students, which is usually a shared faculty, set of buildings, social space and also often includes a 
shared administration and charter. 


Schools eligible for ICILS are those at which target grade students are enroled. 


In order to ensure international comparability, the definition of “school” should be equivalent in 
all participating countries. In most cases, identifying schools for sampling purposes in ICILS was 
straightforward. However, there were some cases where identification of schools for sampling 
purposes was more difficult. National centers were provided with the following examples in order 
to help them identify sampling units: 


e Sub-units of larger “campus school” (administrative “schools” consisting of smaller schools from 
different cities or regions) should be regarded as separate schools for sampling purposes. If a 
part of a larger campus school was selected for ICILS 2018, the principal or ICT coordinator 
of the combined school was asked to complete the school questionnaire with respect to the 
sampled sub-unit only. 


e Schools consisting of two administrative units, but have shared staff, shared buildings, and offer 
some opportunities for the students to change from one school to the other, should be regarded 
as one combined school for sampling purposes. 


e The parts of a school with two or more different study programs that have different teaching 
staff, take place in different buildings, and offer no opportunity for students to change from one 
study program to the other, should be regarded as two or more separate schools for sampling 
purposes. The study programs should be listed as separate units on the school sampling frame. 


2 Teachers that are on a long-term leave during the testing period (e.g., maternity or sabbatical leave) are not in scope of 
ICILS. 
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Coverage and exclusions 


Population coverage 


The ICILS consortium encouraged participating countries to include all schools, students, and 
teachers defined in the target populations in the study, in order to ensure a full coverage of these 
target populations. 


However, it was deemed appropriate to exclude some schools, students, and teachers from the 
target population for practical reasons, such as difficult test conditions or prohibitive survey costs. 
Some students and teachers were not surveyed due the removal of their entire school from the 
sampling frame (school-level exclusions) while some students (but not teachers) were excluded 
within participating schools (within-sample exclusions). 


As inother large-scale assessments, it should be emphasized that the ICILS 2018 samples represent 
the nationally defined target populations only (without the excluded members of the internationally 
defined target population). 


School-level exclusions 

Table 6.1 gives an overview of the types of exclusions of schools and their respective percentages 
of all schools in the desired national target population within each participating country. The 
(school-level) percentages ER.,;, were computed as: 


ERsch 


x100 


ER ech = 


sch 


ER,., denotes the number of schools excluded prior to sample selection, and TP,,,, is the total 
number of schools belonging to the national desired target population. The respective figures 
were provided by the NRCs. 


In most countries, very small schools and schools exclusively dedicated to students with special 
needs students were excluded. Frequently, schools following a curriculum that differed from the 
mainstream curriculum were also not part of the nationally defined target population. Because 
school-level data (collected via the principal and ICT coordinator questionnaires) in the ICILS 
2018 survey were only used to complement the reporting of student- and teacher-level data, no 
specific thresholds were determined for exclusions at the school level. However, the percentages 
of students and teachers excluded due to the removal of entire schools were considered when 
determining the overall proportions of students (see below). 


School exclusions differed significantly across countries, a point that should be kept in mind when 
interpreting results from school-level data. Please note also that because school exclusions typically 
concern small schools, the percentages of excluded schools always tend to be higher than the 
corresponding percentages of excluded students or teachers. 
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Country Type of exclusion Excluded schools 
(% of all schools) 
Chile Very small schools (fewer than six students) 5.0 
Special needs schools 0.1 
Geographically inaccessible 0.1 
Total Spill 
Denmark Very small schools (fewer than five students) 7.0 
Special needs schools Sell 
Treatment centers 1.6 
German, English, Waldorfs schools LS 
Total 14.9 
Finland Special needs schools 95 
Language schools (instructional language not Finnish or Swedish) 1.0 
Total 10.5 
France Overseas territories (TOM) 1.4 
Mayotte 0.3 
Private schools without contract 8.3 
Specialized schools 0.8 
Total 10.8 
Germany Special needs schools 8.9 
Very small schools (fewer than three students) 1.4 
Total 10.4 
Italy Very small schools (fewer than six students) O22 
Special needs schools 0.1 
Students taught in Slovene 0.1 
Schools in remote geographical area or in little islands 0.1 
Total 0.5 
Kazakhstan Students are taught in Uzbek language 1.2 
Students are taught in Uighur language 0.2 
Students are taught in Tadjik language 0.1 
Students are taught in other language 0.0 
Special needs schools 1.3 
Very small schools (fewer than four students) 5.8 
Total 8.6 
Korea, Republic of Very small schools (fewer than five students) 1.8 
Geographically inaccessible schools 44 
Physical education school 0.3 
Total 6.6 
Luxembourg o exclusions on school level 0.0 
Portugal Very small schools (fewer than seven students) 20 
nternational schools 11 
Total Sal 
United States o exclusions on school level 0.0 
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Table 6.1: Percentages of schools excluded from the ICILS target population (contd.) 
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Country Type of exclusion Excluded schools 
(% of all schools) 
Uruguay Special needs schools 0.2 
Geographically remote schools 8.8 
Total 20 
Benchmarking participants 
Moscow (Russian Federation) Special needs schools 25 
Very small schools (fewer than seven students) 48 
Total ES 
North Rhine-Westphalia (Germany) Special needs schools 97 
Very small schools (fewer than three students) 0.2 
Total 9.8 


Student-level exclusions 


Each country was required to keep the overall rate of excluded students (due to school-level and 
within-school exclusions) below five percent (after rounding) of the desired target population. In 
three education systems participating in |CILS 2018 the overall exclusion rate was above five 
percent, which resulted in respective annotations in the ICILS 2018 international report (Fraillon 
et al. 2020). Table 6.1 and Appendix B of this report provide details about the exclusion types 


for each country. 


The overall exclusion rate of students is the sum of the students’ school-level exclusion rate and 
the weighted within-sample exclusion rate. Table 6.2 provides the respective percentages for 


ICILS 2018 countries. 


Table 6.2: Percentages of students excl 


uded from the ICILS target population 


Country Students’ school-level Within-sample Overall 
exclusion (%) exclusions (%) exclusions (%) 
Chile 0.5 0.8 3 
Denmark 3 44 7D 
Finland 1.6 24 4.0 
France 3.4 13 47 
Germany 1.5 2.9 43 
taly 0 29 3.0 
azakhstan 3.4 2.1 5.6 
orea, Republic of 0.9 0.6 1.5 
Luxembourg 0.0 3.9 3.9 
Portugal 0.8 8.0 8.9 
United States 0.0 5.0 5.0 
Uruguay 1.41 0.0 1.4 
Benchmarking participants 
oscow (Russian Federation) 0.7 2.3 3.0 
orth Rhine-Westphalia (Germany) 1.4 oie 4.6 
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The student’s school-level exclusions consisted of those students belonging to schools which 
were excluded prior to the school sampling. The students’ school-level exclusion rate ER, was 
calculated as: 


Ey 
ER1= —* x100 
TP 


E, denotes the number of target grade students in excluded schools, and TP is the total number of 
students belonging to the national desired target population. The respective figures were provided 
by the NRCs. The students within-sample exclusions were based on information collected from the 
sample (i.e., after the school sampling step). The within-sample exclusions consisted of students 
with physical or mental disabilities or students who could not speak the language of the test 
(usually, students with less than one year of instruction in the test language). Students could be 
excluded prior to the within-school sampling or after the within-school sampling was performed.* 
The percentage of student within-school exclusions was calculated using the number of students 
excluded within schools and the total number of students belonging to the national desired target 
population. 


The students’ within-sample exclusions ER» were computed as: 


E 
ye et eR eto 


Ws is the sum of weights of excluded students and W? is the sum of weights of participating 
students. Therefore, Ww? + Ws denotes the (estimated) total number of students belonging to 
the nationally desired target population. The students’ school-level exclusion rate is taken into 
account by multiplying by (1 - ER,). 


The overall exclusion rate of students ER; is the sum of the students’ school-level exclusion rate 
and the weighted within-sample exclusion rate: 


ERiot = ER, 7 ER» 


Table 6.2 provides the respective percentages for ICILS 2018 countries. 


Teacher-level exclusions 


Teachers working in excluded schools were not part of the nationally defined target population. 
Within participating schools, all teachers who met the target population definition were eligible 
for participation in the survey. 


Each country was asked to provide information about the total number of teachers teaching in the 
target grade as well as the proportion of teachers teaching in the target grade in excluded schools. 
For Germany, Finland, and North Rhine-Westphalia (Germany), no statistics on the number of 
eligible ICILS 2018 teachers were available and therefore it was not possible to compute exclusion 
rates. Teacher exclusion rates exceeded five percent in Denmark, France, and the United States. 


3. Insome cases, these students were grouped in classes, which were then excluded as a group, or the sample schools were 
found to have only enroled students within the exclusion categories, which resulted in the corresponding school(s) to 
be excluded ex post. 
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School sampling design 


lEA used a stratified two-stage probability cluster sampling design in order to conduct the school 


sample selection for all |CILS 2018 countries. During the first stage, schools were selected 


systematically with probabilities proportional to their size (PPS) as measured by the total number 


of enroled target grade students. 


During the second s 


tage, within participating schools, students 


enroled in the target grade were selected using a systematic simple random sample approach. 


The following subsections provide further details on the sample design 


for ICILS 2018. 


School sampling frame 


In order to prepare the selection of school samples, national centers provided a comprehensive 
list of schools including the numbers of students enroled in the target grade. This list is referred 
to as the school sampling frame. To ensure that each ICILS 2018 school sampling frame provided 
complete coverage of the desired target population, the sampling team carefully checked and 


verified the plausibility of the information by comparing it with official statistics. 


The sampling team required the following information for each eligible school inthe sampling frame: 


e Aunique identifier, such as a national identification number; 


e School’s measure of size (MOS), which was usually th 


target grade or an adjacen 


t grade; and 


e Values for each of the intended stratification variable 
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by common characteristics. Examples for such groups of 


would be geographic region, urbaniza 


ICILS 2018 applied two different 


sampling frame has been sorted, prior to sampling, by impl 


asimple and straightforward method 


strata. With explicit stratification, independent samples o 


stratum. Th 


e sample sizes for each ex 


to achieve a fairly proportional sample al 


plicit stratum are assigned before the se 


e number of students enroled in the 


a 


ils the grouping of sampling frame units 
units (schools in the case of ICILS 2018) 


tion level, source of funding, or performance level. 


methods of stratification. Implicit stratification means that the 


icit stratification variables, thus providing 
ocation across all 
from each explicit 


ection process In 


f schools are selected 


order to achieve the desired sample precision overall and, where required, also for subpopulations. 


Generally, | 


They use implicit or explici 
thereby 
national centers identify s 
students’ learning-outcom 


They use explicit stratifica 
of schools. 


t stratification to improve t 


EA studies use stratification for the following reasons: 
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The latter design feature was used if the country required 
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public and private schools but only 10 percent of the students 
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Country Explicit stratification variables Number of Implicit stratification variables 
(number of variable characteristics) explicit strata | (number of variable characteristics) 
Chile Grade (Grade 8 and 9/Grade 8 10 Performance level (4) 
only) (2) 
Administration (public/private/private 
subsidized) (3) 
Urbanization level (2) 
Denmark one ational achievement score (5) 
Finland Language of instruction (2) 9 Within Western, and Northern & 
Region (Helsinki & Uusimaa/Southern/ Eastern stratum: Region (4) 
Western/Northern & Eastern) (4) Within Swedish speaking strata: 
Urbanization (2) Urbanization (2) 
Within Swedish speaking school: 
Region (2) 
France School administration (3) 18 one 
Urbanization (3) 
Digital equipment level (2) 
Germany orth Rhine-Westphalia/ other 5 SES indicator (3) 
ederal states (2) Federal state (16) 
Track (gymnasium/nongymnasium/ 
special needs schools) (3) 
taly Region (North/Central/South) (3) 3 Administration (2) 
Performance level (5) 
azakhstan Urbanization (urban/rural) (2) 8 one 
Language of instruction (4) 
orea, Republic of Urbanization (3) 9 one 
School gender (3) 
Luxembourg Schools following national curriculum/ 2 one 
Schools following other curriculum (2) 
Portugal Administration (2) 28 one 
Region (25) 
United States Poverty level (2) 12 Urbanization (5) 
Administration (2) Ethnicity status (2) 
Region (4) 
Uruguay Administration (2) 6 one 
Region (2) 
School type (2) 
Benchmarking participants 
Moscow (Russian Federation) | Performance level (5) Administration (2) 
North Rhine-Westphalia Track (gymnasium/nongymnasium/ SES indicator (3) 
(Germany) special needs schools) (3) 
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Figure 6.1: Visualization of PPS systematic sampling 
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Source: Zuehlke 2011. 


Insome cases, the sampling design deviated from this general procedure: 


e Small schools were selected with equal selection probabilities to avoid large variations of 
sampling weights due to changing size measures. Usually a school was regarded as “small” if 
the number of enroled target grade students was less than 20. 

e Very large schools (i.e., schools with more students than the value of the sampling interval) 
were placed into a separate explicit stratum and selected with certainty (i.e., all schools in this 
category were included in the sample). 


All countries conducted a field trial with a small sample of schools one year before the ICILS main 
survey. Where possible it is preferable that any given school is not selected to participate in both 
the the field trial and the main survey data collection. This is because selection in both the field 
trialand main survey may reduce the likelihood of aschool to agree to participate and also because 
there may be some information sharing within a school about the contents of the instruments that 
could influence the data collected in the main survey. In ICILS we prevented this by selecting the 
sample of schools to participate in each of the field trial and main survey simultaniously (as part of a 
single larger sampling procedure) in each country. For example, if a country was planning to sample 
25 schools in the field trial and 150 schools in the main survey then a single combined sample of 
175 schools, was selected first. Then from these schools the main survey sample of 150 schools 


was subsampled, leaving the remaining (25) schools for the field trial sample. 
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Student sampling 
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Teacher sampling 


As was the case for student sampling, | 
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EA WinW23S employed systematic stratified sampling with 


equal selection probabilities to select teachers from comprehensive lists of in-scope teachers 
provided by the participating schools. The procedure also ensured a sample allocation among 
subgroups that was near to proportional. To increase sample precision, teacher lists were 
implicilty stratified by sorting them according to gender, main subject domain, and birth year 
prior to sampling. 


6 For very large schools (see above) it is not always possible to assign two replacement schools as the preceeding school 
or the school directly listed after the sampled schools is either another sampled school or a replacement school 
assigned to another sampled school. In extreme cases, no replacement school can be assigned. This could also happen 
if the number of schools to sample from a stratum is very high compared to the number of schools in this stratum. If 
the sampled school is the first or last in its stratum, usually the two following and preceeding schools are assigned 


respectively. Please note further that for the field trial only one replacement school was assigned. 
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Sample size requirements 


The ICILS 2018 consortium, in line with practice in other IEA studies, set high standards for 
sampling precision and aimed to achieve reasonably small standard errors for survey estimates. 
The student sample should ensure a specified level of precision for population estimates; 
defined by confidence intervals of +0.1 standard deviation for means, and +5% for percentages. 
With respect to the main outcome variable in ICILS 2018-that is, the students’ computer and 
information literacy (CIL) score scale established in ICILS 2013 with a mean of 500 score points 
and a standard deviation of 100 for equally weighted national samples from the first cycle-this 
requirement translated into standard errors that needed to be below five score points. IEA was 
responsible for determining sample sizes that were expected to meet these requirements for 
each participating country. With the exception of one participating country (Unites States), which 
failed to meet the IEA sample participation standards, all participating countries and educational 
systems achieved this requirement (see Fraillon et al. 2020, p. 75). The required precision levels 
of percentages were also met for the vast majority of population estimates presented inthe ICILS 
international report (Frallion et al. 2020). 


There were also other consideration that needed to be taken into account when determining 

the required number of sampled students and teachers: 

e Some types of analysis, like multilevel modeling, require a minimum number of valid cases at 
each sampling stage (see, for example, Meinck and Vandenplas 2012); 


e For the purpose of building scales and sub-scales, a minimum number of valid entries per 
response item is required; and 

e Reporting on subgroups (e.g., age or gender groups) requires a minimum sample size for each 
of the subgroups of interest. 

All these considerations were taken into account during the process of defining minimum sample 

sizes for schools, students, and teachers. 


School sample sizes 


The minimum sample size for the ICILS main survey was 150 schools for each country.’ In some 
countries it was necessary to select more schools than the minimum sample size due to one or 
more of the following reasons: 
e Previous student surveys had shown a relatively large variation of student achievement between 
schools in acountry. In these cases, it was assumed that the IEA standards for sampling precision 
could only be met by increasing the school sample size. 


e The number of schools with less than 20 students in the target grade was relatively large so 
that it was not possible to reach the student sample size requirements by selecting only 150 
schools (see next section below). 


e The country requested oversampling of particular subgroups of schools to accommodate 
national research interests. 


Student sample sizes 

Typically 20 students were randomly selected from the full target grade cohort (i.e., across all 
classes) in each sampled school. In schools with 25 or fewer students in the target grade, all 
students were selected.® 


7 Luxembourg conducted a census of schools, i.e., all 41 school were asked for participation. 

8 Inthe Unites States a minimum of 30 students were randomly selected, because students to be excluded could only 
be identified after the within-school sample was conducted. In Luxembourg, a census of students was used. Thus all 
students were selected. 
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Each country was required to have an achieved s 
students. Due to non-response, school closures, or other factors, some countries did not meet this 
requirement. The ICILS 2018 sampling team did not 
as the country met the overall participation rate requ 


Teacher sample sizes 


4 


Typically 15 teachers of the target grade were random 


with 20 or fewer teachers of the target grade, all teachers were selected.’ 


In summary, the minimum sample size requirements 


e Schools: 150 in each country 


e Students: 20 (or all) per school 


e Teachers: 15 (or all) per school 


for ICILS were as follows: 


tudent sample size of about 3000 tested 


regard this outcome as problematic as long 
irements (see Chapter 7). 


y selected in each sampled school. Inschools 


Table 6.4 lists the intended and achieved school sample sizes, the achieved student sample sizes, 


and the achieved teacher sample sizes for each participating country. Note that schools may have 
been treated as participating in the student survey but not in the teacher survey and vice versa 
due to specific minimum within-school response rate requirements. This explains differences in the 


numbers of participating schools for the student and teacher survey across ICILS 2018 countries.*° 


Table 6.4: School, student, and teacher sample sizes 
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Country Originally Student survey Teacher survey 
sampled TEN war Sane ee 
adnarle Participating Participating Participating Participating 
schools students schools teachers 
Chile 80 178 3092 174 686 
Denmark 50 4 2404 138 118 
Finland 50 4 2546 143 853 
France 56 156 2940 122 462 
Germany 234 209 3655 182 2328 
taly 50 150 2810 148 775 
azakhstan 86 183 3371 184 2623 
orea, Republic of 50 150 2875 147 2127 
Luxembourg 41 38 5401 28 494 
Portugal 220 200 3221 208 2823 
United States 352 263 6790 259 3218 
Uruguay 177 166 2613 171 1320 
Benchmarking participants 
oscow (Russian Federation) 150 150 2852 150 2235 
orth Rhine-Westphalia (Germany) 115 109 1991 107 1468 


9 \|n Luxembourg, the minimum sample size was increased to 25 teachers per school, due to the small number of schools. 
10 Please refer to Chapter 7 for details on ICILS 2018 standards for sampling participation. 
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Efficiency of the ICILS 2018 sample design 

As already noted, ICILS 2018 determined specific goals in terms of sampling precision, especially 
that standard errors should be kept below specific thresholds. Let us illustrate this concept insome 
more detail for readers who are not as familiar with this topic. 


In any sample survey, researchers would like to use data collected from the sample to get a good 
(or “precise’) picture of the population from which the sample was drawn. However, there is aneed 
to define what is “good” in terms of sampling precision. Statisticians aim for a sample that has as 
little variance and bias as possible for specific design and cost limits. A measure of the precision is 
the standard error. The larger the standard error, the more “blurred” the picture is, and inferences 
from sample data to populations become less reliable. 


Let us assume our population of interest is the left-hand picture in Figure 6.2 below, the famous 
picture of Einstein taken by Arthur Sasse in 1951. The picture consists of 340,000 pixels. We 
can draw samples with increasing numbers of pixels from this picture and reassemble the picture 
using only the sampled pixels. As can be seen in the middle and right-hand pictures in Figure 6.2, 
the picture obtained from the sampled pixels becomes more precise as the sample size increases. 
The standard errors from different samples sizes are equivalent to reflections of the sampling 
precision in this example. 


Figure 6.2: Illustration of sampling precision—simple random sampling 


Picture = Population Sample size = 10,000 Sample size = 50,000 


Determining sampling precision in infinite populations is relatively straightforward as long as 
simple random sampling (SRS) is employed. The standard error of the estimate of the meant from 
asimple random sample can be estimated as: 


O; = OF 
n 

with o* being the (unknown) variance in the population and n being the sample size. If the variance 
inthe population is known, the sample size needed for a given precision level can be easily derived 
from the formula. For example, assuming the standard deviation o of an achievement scale to be 
100, the population variance o? would be 10,000, and the standard error of the estimated scale 
mean oy will equal five scale score points or less. Rearranging the formula above leads then to a 
required minimum sample size of 400 students per country. As pointed out earlier, however, the 
actual minimum sample size for participating countries in ICILS 2018 was 3000 students. 
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The key reason for this sample size requirement is that ICILS 2018 did not employ SRS sampling 
but cluster sampling. Students in the sample are members of “clusters” as groups of them belong 
to the same schools. 


Students within a school tend to be more similar to one another than students from different schools 
because they are exposed to the same environment and teachers. Furthermore, they also often 
share common socioeconomic backgrounds. Therefore, the gain in information through sampling 
additional individual students within schools is less than when sampling additional schools, even 
if the total sample size is kept constant. In other words, due to the homogeneity of students in the 
same schools the sampling precision of cluster samples with similar sample sizes tends to be less 
than when applying SRS. 


For this reason, the SRS formula given above is not applicable for data from cluster samples. In 
fact, and depending also on the outcome variable beeing measured, applying this formula will most 
likely underestimate standard errors from cluster sample data by a considerable margin. Figure 
6.3 visualizes this effect through the example of the Einstein portrait where the number of pixels 
sampled is the same in both pictures. However, in the right-hand picture, clusters of pixels were 
sampled rather than single pixels as in the left-hand picture. 


Figure 6.3: Sampling precision with equal sample sizes—simple random sampling versus cluster sampling 


Note that stratification also has an influence on sampling precision. Through the choice of 
stratification variables related to the outcome variables it is possible to increase the sampling 
precision compared to non-stratified samples. However, experience shows that in large-scale 
assessments in education the impact of stratification on sampling precision tends to be much 
smaller than the effect of clustering. Stratification is another reason as to why the SRS formula 
for estimating sampling variance is not applicable for ICILS 2018 survey data. 


Because of the above reasons, estimation of sampling variance for complex sample data is not as 
straightforward as it is for simple random samples. Chapter 13 of this report explains in more detail 
the jackknife repeated replication (JRR) method which should be used for a correct estimation of 
standard errors for ICILS 2018 data. 


The achieved efficiency of the ICILS sampling design is measured by the design effect as: 
Var 


JRR 


deff = 
eff Vl <p 


where VAR... is the design-based sampling variance for a statistic estimated by the JRR method, 
and VAR,,. is the estimated sampling variance for the same statistic on the same data base but 
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e ICILS 2018 main outco 
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saged effective sample size of 4 
teacher sample size was also estimated at 400 or above ina 


the ClL scale is equal to 2.6. Most 
survey showed even lower design effects. The average design 


| ICILS 2018 countr 


ies. 


me desi 


riables f 
ntry. This information helps to determine sam 


e (with replacement, conditional on the achieved 
estimate the design effect ina given country from 
ivalent outcomes variables, we can also determine 


f 400 students. Within the context of large-scale 
is an estimate of the sample size that would be 
of a cluster sample if simple random sampling 

a country where the design effect in a previous survey was 

the effective sample size would provide an estimate 


ng that the samples apply 


for 


rs within schools 


CES 


ates to the design effect of the ClL and the 
e effective sample size relates to the average design effects across 
e presented scales. As is evident from Table 6.5, all national samples achieved or, more often, 
100. Table 6.6 shows that the effective 


other scales pertaining to the student 
effect of the CT scale is 2.0 and tends 


to be lower than the CIL scale design effect in most ICILS 2018 countries that administered the 


optional CT assessment. The teacher-related scales had an 


ICILS 2018 countries. 


average design effect of 2.6 across 


11 The measurement error for the CIL scales is included in VAR,,,.. Chapter 13 provides further details on measurement 


error estimation. 
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Table 6.6: Design effects and effective samples sizes of mean of scale scores and plausible values—Student 


survey 
Country Sample Design effects using: Effective sample size based on: 
Blze Mean of CIL cr Mean of CIL ey 
scale scores plausible plausible scale scores plausible plausible 
values 1-5 values 1-5 values 1-5 values 1-5 

Chile 3092 2.7 41 N/A 1128 761 N/A 
Denmark 2404 1.6 18 15 1515 358 1611 
Finland 2546 1.6 25 2.2 1589 037 1157 
France 2940 1.6 1.8 1.5 1847 595 1899 
Germany 3655 23: 3.1 3.0 1578 186 1232 
taly 2810 1.6 2.3 N/A 1758 247 /A 

azakhstan 3371 3.6 5.4 N/A 934 622 /A 

orea, Republic of 2875 22 2A 2.9 1325 368 980 
Luxembourg 5401 0.8 0.8 0.6 6445 6779 8447 
Portugal 3221 241 29 2.3 1517 099 424 
United States 6790 23 2.6 2.6 2984 2649 2637 
Uruguay 2613 1.7 3.1 N/A 1574 843 /A 
Benchmarking participants 

oscow (Russian Federation) 2852 1.9 2.2 N/A 1490 1283 JA 

orth Rhine-Westphalia 1991 1.4 1.8 15 1428 1097 1293 
(Germany) 
ICILS 2018 average 3326 2.0 2.6 2J0) ADE HOS, 2298 


Note: N/A = not available. 
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schools. This approach allows us to exploit the (usually) little information we have available about 
respondents and non-respondents, and to assume that school non-participation is associated 
with the different strata (see also Lohr 1999). The approach also assumes a non-informative 
response model, implying that non-response occurs completely at random within the adjustment 
cell (i.e., in ICILS 2018 within a stratum). 


Calculating student weights 
School base weight (WGTFAC1) 


The first sampling stage involved selecting schools in each country; the school base weight reflects 
the selection probabilities of this sampling step. When explicit stratification was used, the school 
samples were selected independently from within each explicit stratum h, with h =1,..., H. lf no 
explicit strata were formed, the entire country was regarded as being one explicit stratum. 


Systematic random samples of schools were drawn in all countries, with a selection probability 
of school j in stratum h proportional to its size (PPS sampling). The measure of school size M,, 
was defined by the number of students in the target grade or an adjacent grade. If schools were 
small (M,,<20), the measure of size M,, was redefined as the average size of all small schools in 
that stratum. In a few countries, equiprobable systematic random sampling (SyRS) was applied 
in particular strata. 


The school base weight was defined as the inverse of the school’s selection probability. For school 
iin stratum h, the school base weight was given by: 


M 
WGTFAC1,.= a for PPS sampling and 


N 
WGTFAC1,,= >3 for SyRS 


h 


where n; is the number of sampled schools in stratum h, M, is the total number of students enrolled 
inthe schools of explicit stratum h, M,, IS the measure of size of the selected school i, and Nis the 
total number of schools in stratum h. 


School non-response adjustment (WGTADJ1S) 


School base weights for participating schools needed to be adjusted to account for the loss in 
overall sample size from schools that either refused to participate or had to be removed from 
the international dataset due to low within-school participation. Adjustments were calculated 
within non-response groups defined by the explicit strata. A school non-response adjustment was 
calculated for each participating school i within each explicit stratum h as: 


se 
h 

n p-std 
h 


WGTADJ15,,= 


where nis the number of sampled eligible schools and nis the number of participating schools 
whether originally sampled or replacement schools) in the student survey in explicit stratum h. 


The number n;* in this section is not necessarily equal to n; in the preceding section, as n°" was 
restricted to schools deemed eligible in ICILS 2018. Because of the lapse of one or two years 
between school sampling and the actual assessment, some selected schools were no longer eligible 
for participation. This happened if schools had been closed recently, did not have students in the 
target grade, or had only excluded students enroled. Ineligible schools such as these were not 
taken into account when calculating the non-response adjustment. 
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Student base weight (WGTFAC3S) 


The IEA Windows Within-School Sampling Software (IEA WinW3S) was used to manage within- 
school sampling during the second sampling stage (see Chapter 8 for details) in order to conduct 
a systematic random selection of students from the target grade. The student base weight for 
student k was calculated as: 


M, 
WGTFAC3S,,= —" 
j m 


hi 


where M,, is the total number of students in the target grade in school iin stratum h and m,, is the 
number of sampled students in school i in stratum h. 


In schools with fewer than 26 target grade students, all eligible students were selected for 
participation. In these cases, the weight factor was set to a value of one.” 


Student non-response adjustment (WGTADJ3S) 


Unfortunately, not all selected students were able or willing to participate in |CILS 2018. To account 
for the reduction in sample size due to within-school non-participation, a student non-response 
adjustment factor was introduced. Given the lack of information about absentees, non-participation 
has to be assumed, for weighting purposes, as being completely random within schools. This means 
that participating students represent both participating and non-participating students within a 
surveyed school. Accordingly their sampling weights had to be adjusted. 


The adjustment for student non-response for each participating student k was calculated as: 


Mh 

WGTADJ3S, = —~ 
Tl m. 

with m;. being the number of eligible students in school i in stratum h and m?. being the number 
of participating students in school j in stratum h. In the context of student weight adjustment, 
students of the target population were regarded as eligible if they had not been excluded due to 
disabilities or language problems.? 


Please note that sampled students who did not participate in the survey because they had left the 
sampled school after within-school sampling were counted as absent in the sampled school. These 
students were assumed to remain part of the target population (they moved to a different school 
but had a zero chance of selection since within-school sampling had already been completed at 
this point). Excluded students within participating schools carried their weight (i.e., reflecting their 
proportion in the target population) and this contributed to the overall estimates of exclusion. 


Final student weight (TOTWGTS) 


The final student weight of student k in school i in stratum h is the product of the four student- 
weight components: 


TOTWGTS,,,= WGTFAC1,, X WGTADJ1,, x WGTFAC3,,, X WGTADJ3,, 


2 Two countries deviated from this rule: in Luxembourg, all students were sampled, and in the Unites States, the sample 
size for the students was set to 30. 

3 For Chile this adjustment factor includes a gender adjustment factor as the original population estimates regarding the 
distribution of girls and boys did not match the recorded proportion of boys and girls in Chile (although the estimated 
total number of students did match the recorded population figures). This could be because the distribution of single- 
sex schools was not controlled for in the sample. The adapted adjustment factor fits the population estimates regarding 
the population size and the gender distribution. 
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Calculating teacher weights 
School base weight (WGTFAC1) 


As the same schools were sampled for the student survey and the teacher survey, the school base 
weight of the teacher survey was identical to the school base weight of the student survey. 


School non-response adjustment (WGTADJ1T) 


A school non-response adjustment for the teacher study was calculated in the same way as the 
student non-response adjustment. Given that schools could be regarded as participating in the 
student survey but not inthe teacher survey, and vice versa, the school non-participation adjustment 
potentially differed between student and teacher data from the same school. To account for non- 
responding schools inthe sample, it was necessary to calculate a school weight adjustment for the 
teacher survey as follows for school i: 


Se 


WGTADJ1T,, = nee 


p-tch 
h 


Here, n** is again the number of sampled eligible schools and nP“" is the number of schools 
participating (whether originally sampled or replacement schools) inthe teacher survey in stratum h. 


Teacher base weight (WGTFAC2T) 


A systematic random sampling method, carried out via the IEA WinW3S, was used to randomly 
select teachers in each school. 


The teacher base weight for teacher | was calculated as: 


Ty 


WGTFAC2T = 7 
hi 


where T,, is the total number of eligible teachers in school iin stratum h and t;.is the number of 


sampled teachers in school / in stratum h. 
In schools with fewer than 21 target grade teachers, all eligible teachers were selected for 
participation. In these cases this weight factor was equal to one.* 


Teacher non-response adjustment (WGTADJ2T) 


Not all teachers were willing or able to participate in the study. Therefore, participating teachers 
represented both participants and non-participants. Again, the non-response adjustment carried 
out within a given school assumed, for weighting purposes, that there was a random process 
underlying teachers’ participation. 


The non-response adjustment was computed for each participating teacher | as: 
tse 
hi 
WGTADJ2T,, = BP 
where t?* is the number of eligible sampled teachers, and tP.is the number of participating teachers in 
school iinstratum h. Teachers, who left the school after they had been sampled but prior to the data 


collection, were regarded as out of scope and their weights were not adjusted in these instances. 


4 InLuxembourg, the number of teachers to select was increased to 20, thus all eligible teachers were selected in schools 
with less than 25 target grade teachers. 
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Teacher multiplicity factor (WGTFAC3T) 


Some teachers in ICILS 2018 were teaching at the target grade in more than one school (based 
on information from the teacher questionnaire) and therefore had a larger selection probability 
than those teaching at the target grade in only one school. In order to account for this, a “teacher 
multiplicity factor” was calculated as the inverse of the number of schools in which the teacher 
was teaching: 


WGTFACST,,= = 
fi 
Here, f,, is the number of schools where teacher | in school i in stratum h was teaching. 


hil 


Final teacher weight (TOTWGTT) 


The final teacher weight for teacher | in school i in stratum h is the product of the five teacher- 
weight components: 


TOTWGTT,, = WGTFAC1,, X WGTADJ1T,, X WGTFAC2T,,.X WGTADJ2T,, X WGTFAC3T,, 


Calculating school weights 


ICILS 2018 was designed as a survey of students and teachers but not as aschool survey. However, 
in order to collect background information at school level, a principal questionnaire and an ICT 
coordinator questionnaire were administered to every participating school. School weights were 
calculated and included in the international database in order to allow for analyses at the school 
level. However, results at the school level should be interpreted with some caution as they may 
be subject to considerable sampling error. 


School base weight (WGTFAC1) 


This weight component is identical to the school base weight of the student survey and the teacher 
survey (see above). 


School weight adjustment (WGTADJ1C) 


Schools in which no items were completed in either the principal questionnaire or the ICT 
coordinator questionnaire were regarded as non-participants in the school survey. In order to 
account for these non-responding schools, a school weight adjustment component was calculated 
for each participating school i as follows: 


n 
WGTADJ1C,, = — 


p-sch 
h 


Here, n, represents the number of eligible sampled schools and n?*" represents the number of 
schools with completed questionnaires in stratum h (whether originally sampled or replacement 
schools). 


ote that some schools may have been non-participants in the school survey but participated in the 
student and/or the teacher surveys. Consequently, some schools were regarded as participants in 
the student and/or teacher survey but as non-participants in the school survey. Some schools may 
also have completed (at least one of) the school-level questionnaires but were regarded as non- 
participants in the student and/or teacher surveys. It is very important to keep this in mind when 
undertaking analysis with data from different data sets. For this kind of multivariate analyses using 
different data sources, the proportion of missing values may accumulate with increasing numbers 
O 
O 


f variables. Those undertaking secondary analyses should thoroughly monitor the potential loss 
f information due to missing data across the different sampling units. 


84 


ICILS 2018 TECHNICAL REPORT 


Final school weight 


The final school weight of school jin stratum h is the product of the two weight components: 


TOTWGTC,,, = WGTFAC1,, X WGTADJ1C,, 


Calculating participation rates 


For ICILS 2018, weighted and unweighted participation rates were calculated at student and 
teacher levels to facilitate the evaluation of data quality and reduce the risk of potential bias due 
to non-response. In contrast to the weight-adjustments described earlier, participation rates 
were computed first considering the originally sampled schools only and then considering originally 
sampled and replacement schools. 


Unweighted participation rates in the student survey 


Let op denote the set of originally sampled eligible and participating schools, fp the full set of eligible 
participating schools, including replacement schools, and np the set of sampled eligible but non- 
participating schools in the student survey. Let eg ep and Nip denote the numbers of schools in 
each of the respective sets. The unweighted school participation rate in the student survey before 


replacement is calculated as: 


n 
UPRS < 


schools BR n.+#n 
fp np 


The unweighted school participation rate in the student survey after replacement is computed as: 


n 
UPRS p 


schools_AR n+n 
fp np 


Let sfp represent the set of eligible and participating students in all participating schools, that is, 
inthe schools that constitute fp as the complete set of eligible participating schools. Let snp be the 
set of eligible but non-participating students in schools that constitute fp, and let fe and N,,,D€ 
the number of students in these two respective groups. The unweighted student response rate 
is computed as: 


n 
UPRS = 


students n, +n. 
sfp sn 


Ip 


Note that it was not deemed necessary to compute student response rates separately for 
originally sampled and replacement schools because non-response patterns did not vary between 
(participating) originally sampled and replacement schools. 


The unweighted overall participation rate in the student survey before replacement is then: 


UPRS oy ag= UPRS enone eX UPRS. 


overall_BR schools_ students 


The unweighted overall participation rate in the student survey after replacement is: 


UPRS 


UPRS x UPRS 


overall AR schools AR students 


Weighted participation rates in the student survey 


The weighted school participation rate in the student survey before replacement was calculated 
as the ratio of summations of all participating students k in stratum h and school i: 


> p> teop best WWGTFAC1,, x WGTFACSS,,, x WGTADI3S, 
i€op —kesfp hi hik hik 
Yn Die Lkestp WGTFACT,,X WGTADIJ15,, X WGTFACSS,,,X WGTADJ3S, , 


WPRS 


schools_BR ~ 
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Here, the students in the numerator were computed as the sum over the originally sampled 
participating schools only, whereas the students in the denominator were calculated as the total 
over all participating schools. 

The weighted school participation rate in the student survey after replacement is therefore: 


>, DietoDkeapp WGTFAC1,,X WGTFACSS,,, X WGTADI3S,, 
Yh Diep Dtexjp WGTFACI,,X WGTADJ1S,, x WGTFACSS,, X WGTADJ3S 


hik 


WPRS 


schools AR 7 


The weighted student participation rate was computed as follows, taking again replacement 
schools into account: 


>, DieioDkeayp WGTFAC1,,X WGTFACSS,,, 
>, Dieip Dteoyp WGTFACI,,X WGTFACSS,,,X WGTADI3S,, 


WPRS 


students 


The weighted overall participation rate in the student survey before replacement is therefore: 


WPRS = WPRS x WPRS 


overall_BR schools BR students 


The weighted overall participation rate in the student survey after replacement is: 


WPRS = WPRS x WPRS 


overall_AR schools AR students 


Overview of participation rates in the student survey 


Table 7.1 and Table 7.2 display the unweighted and weighted participation rates of all countries 
in the student survey. Differences between the two tables indicate different response patterns 
among strata with disproportional sample allocations. For example, the unweighted school 
participation rate of Germany was considerably higher than the weighted rate because the federal 
state of North Rhine-Westphalia, in which almost all schools participated, was oversampled for 
Germany. North Rhine-Westphalia participated in ICILS both as a state within Germany and as a 
benchmarking participant. In comparison, relatively fewer schools participated in the remaining 
strata across Germany. 


It should be noted that only those schools that had at least a participation rate of 50 percent among 
their sampled students were treated as participants in the student survey. A school that did not 
meet this requirement was regarded as anon-participating school for the student data collection. 
The non-participation of this school had an effect on the school participation rate; however, the 
students from this school were exempted from the calculation of the student participation rate. 
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Table 7.1: Unweighted school and student participation rates—Student survey 


School participation rate (%) Student Overall participation rate (%) 
Country Before After participation Before After 
replacement replacement rate (%) replacement replacement 

Chile 91.1 99.4 93.6 85.2 eral 
Denmark 76.0 95.3 85.3 64.8 81.3 
Finland 97.9 98.6 91.8 89.9 90.6 
France 99.4 100.0 94.7 94.1 94.7 
Germany 84.3 913 88.5 74.6 80.8 
taly 95.3 100.0 95.0 90.6 95.0 

azakhstan 99.5 99.5 97.9 97.3 97.3 

orea, Republic of 100.0 100.0 96.7 96.7 96.7 
Luxembourg 92.7 92.7 90.1 83.5 83.5 
Portugal 86.3 91.3 81.1 70.0 74.1 
United States 67.5 76.9 90.8 614 69.9 
Uruguay 91.9 95.9 80.6 74.0 113 
Benchmarking participants 

oscow 98.7 100.0 95.8 94.5 95.8 
(Russian Federation) 

orth Rhine- 92.9 97.3 91.6 85.0 89.1 
Westphalia (Germany) 


Table 7.2: Weighted school and student participation rates—Student survey 


School participation rate (%) Student Overall participation rate (%) 
Country Before After participation Before After 
replacement replacement rate (%) replacement replacement 

Chile 91.0 100.0 93.1 84.8 93.1 
Denmark 75.6 95.3 84.8 64.1 80.8 
Finland 98.3 98.6 919 90.3 90.6 
France 99.4 100.0 95.0 94.4 95.0 
Germany 78.9 88.3 86.6 68.3 76.5 
taly 95.1 100.0 94.9 90.3 94.9 

azakhstan 99.5 99.5 97.6 97.2 97.2 

orea, Republic of 100.0 100.0 96.7 96.7 96.7 
Luxembourg 96.4 96.4 90.1 86.9 86.9 
Portugal 85.7 90.2 80.0 68.6 72.2 
United States 674 77.1 91.0 614 70.2 
Uruguay 90.7 95.7 80.2 728 768 
Benchmarking participants 

oscow 98.2 100.0 95.7 93.9 95.7 
(Russian Federation) 

orth Rhine- 92.6 97 A 91.0 84.2 88.6 
Westphalia (Germany) 
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Unweighted participation rates in the teacher survey 


The computation of participation rates in the teacher survey follows the same logic as applied in 
the student survey. 


Let op, fp, and np be defined as above, such that the participation status now refers to the teacher 
survey instead of the student survey, and let (gil and ae be defined correspondingly. The 
unweighted school participation rate in the teacher survey before replacement is computed as: 


n 
_ op 
UIPRT seacan _ ntn 
fp np 


The unweighted school participation rate in the teacher survey after replacement is calculated as: 


n 
UPRT. td 


schools AR n.+n 
fp 


np 


Let tfp be the set of eligible and participating teachers in schools that constitute fp, tnp be the set 
of eligible but nonparticipating teachers in schools that constitute fp, and let hes and Hiss be the 


number of teachers in the respective groups. The unweighted teacher response rate is defined as: 


n 
UPRT yeachers = $e 


teachers — 
Neo Dinp 


Note that it was not deemed necessary to compute teacher response rates separately for 
(participating) originally sampled and replacement schools because the non-response patterns 
did not vary between sample and replacement schools. 


The unweighted overall participation rate in the teacher survey before replacement is computed as: 


UPRT. 2 = UPRT. xX UPRT 


overall_B. schools BR teachers 


The unweighted overall participation rate in the teacher survey after replacement is calculated as: 


UPRT. = UPRT. x UPRT 


overall_AR schools AR teachers 


Weighted participation rates in the teacher survey 


The weighted school participation rate in the teacher survey before replacement is calculated as: 


>, Pico 2 tetip WGTFACA,, x WGTFAC2T,,, x WGTADJ2T,,, x WGTFACST j, 


WPRT. 


schools_BR — 


>, Dieio Dietip WGTFAC1,,x WGTADJ1T,,x WGTFAC2T, ,x WGTADJ2T,,x WGTFAC3I 


hil 


The weighted school participation rate in the teacher survey after replacement is calculated as: 


>, Dieip Dietip WGTFACA,, x WGTFAC2T,, x WGTADJ2T,,, x WGTFACST,, 


WPRT. = 


schools AR 


Yh Diep Diet WGTFACL,,x WGTADJ1T,, x WGTFAC2T, x WGTADJ2T, x WGTFACST, , 


The weighted teacher participation rate is therefore: 
Dp Diefp Xletfp WWGTFAC1,,x WGTFAC2T,,, x WGTFACST,, 


WPRT, = 


teachers 


>, Dieip Diep WGTFAC1, x WGTFAC2T,, x WGTADJ2T,,,x WGTFAC33T, 


hil hil 


The weighted overall participation rate in the teacher survey before replacement is calculated as: 


WPRT. = WPRT. x WPRT. 


overall_BR schools_BR teachers 


The weighted overall participation rate in the teacher survey after replacement is computed as: 


WPRT. = WPRT. x WPRT. 


overall_AR schools AR teachers 
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Overview of participation rates in the teacher survey 


Table 7.3 and Table 7.4 display the unweighted and weighted participation rates of all countries in 


the teacher s 


urvey. Once more, discrepancies between the two tables indicate differential response 


patterns between strata with disproportional sample size allocations. As described earlier, Germany 


provides a prominent example of this effect. 


Note that on 


ly those schools where at least 50 percent of their sampled teachers had completed 


the survey were regarded as participants in the teacher survey. A school that did not meet this 


requirement 
of this schoo 


was regarded as anon-participating school in the teacher survey. The non-participation 
had an effect on the school participation rate (for the teacher survey), but not on the 


teacher parti 


cipation rates as the teachers from this school were not included in the calculation 


of the teacher participation rates. 


Table 7.3: Unweighted school and teacher participation rates—Teacher survey 


School participation rate (%) Teacher Overall participation rate (%) 
Country Before After participation Before After 
replacement replacement rate (%) replacement replacement 

Chile 89.9 97.2 93.5 84.1 90.9 
Denmark 72.7 92.0 84.4 61.4 777 
Finland 97-3. 97.9 92.2 89.7 90.3 
France 78.2 78.2 81.0 63.4 63.4 
Germany 73.8 795 85.3 62.9 678 
taly 94.0 98.7 92.8 87.3 91.6 

azakhstan 100.0 100.0 99.9 09.9) 99.9 

orea, Republic of 100.0 100.0 100.0 100.0 100.0 
Luxembourg 68.3 68.3 75.0 512 51.2 
Portugal 90.9 95.9 91.4 83.0 87.6 
United States 66.1 75.7 88.6 58.5 67.1 
Uruguay 66.9 70.3 75.3 50.4 53.0 
Benchmarking participants 

Oscow 98.7 100.0 100.0 98.7 100.0 
(Russian Federation) 

orth Rhine- 91.1 95.5 90.3 82.3 86.3 
Westphalia (Germany) 
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Table 7.4: Weighted school and teacher participation rates—Teacher survey 


School participation rate (%) Teacher Overall participation rate (%) 
Country Before After participation Before After 
replacement replacement rate (%) replacement replacement 

Chile 91.2 96.9 93.6 85.3 90.7 
Denmark 70.4 92.0 84.0 59.2 77.3 
Finland 97.8 98.0 92.5 90.4 90.7 
France 78.4 784 80.6 63.2 63.2 
Germany 63. 70.5 81.7 Bl 57.5 
taly 93.8 98.6 919 86.2 90.6 

azakhstan 100.0 100.0 100.0 100.0 100.0 

orea, Republic of 100.0 100.0 100.0 100.0 100.0 
Luxembourg 68.5 68.5 13:0 51.8 51.8 
Portugal 89.0 95.3 91.6 81.5 87.3 
United States 62.2 724 89.4 55.6 64.7 
Uruguay 69.5 74.1 74.5 51.8 55.2 
Benchmarking participants 

Oscow 97.6 100.0 100.0 97.6 100.0 
(Russian Federation) 

orth Rhine- 90.2 95.6 91.1 82.2 87.2 
Westphalia (Germany) 


ICILS 2018 standards for sampling participation 


Itis asignificant challenge within countries to achieve full participation (i.e., 100%) in a large-scale 
assessment and the nature of these challenges vary across countries. Given that one essential 
the national level, it is necessary to adjudicate for each country 


purpose of ICILS is to report data at 


whether the achieved sample is sufficient to warrant scient 


fically defensible reporting of national 


estimates. As is customary in IEA studies, |CILS 2018 established guidelines for reporting data 
rticipation. Adjudication of the data was done separately for 
each participating country and each of the two different ICILS 2018 survey populations. This was 
recommendations of the sampling referee (Marc Joncas) and 
he ICILS Joint Management Committee. 


for countries with less than full pa 


carried out in accordance with the 
in agreement with all members of t 


The first step of the adjudication process was to determine the minimum requirements for within- 


school participation. 


Within-school participation requ 


irements 


In general, decreasing response rates entail increasing the risk of biasing results. Because very little 
information about non-respondents was available, it was not possible to quantify the risk or bias 


of estimates due to non-participati 


on inmost countries. To overcome this, and in addition to the 


overall participation rate requirements described below, ICILS 2018 established strict standards 
for minimum within-school participation: data from schools with a response rate of less than half 
(50%) of sampled students or teachers, respectively, were discarded. This constraint meant that 
not every student or teacher who completed a survey instrument was automatically considered 


as participating and thereby contri 


buting to the computation of population estimates. 


The within-school response rate was computed separately for the student survey and the teacher 


survey. Therefore, aschool may count as participating in the student survey but not inthe teacher 


survey or vice versa. 
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Student survey participation requirements 

Students were regarded as respondents if they replied to at least one task 
test. Please note, however, that the overall amount of partial non-response (i 
questionnaires or tasks that had not been attempted) was minimal. 


may increase as within-school response rates decrease. 
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in the achievement 
.€., omitted items in 


There is evidence that attendance and academic performance tend to be positively correlated 
Balfanz and Byrnes 2012; Hancock et al. 2013). Consequently, the likelihood of biased results 


Whenever there was evidence that the survey operation procedures in a school had not been 
conducted following the established ICILS 2018 standards, the corresponding school was regarded 


as anon-participant. For example, if a school failed to list all eligible students 


for the selection of 


a student sample and thus causing a risk of bias due to insufficient coverage, the corresponding 


school’s student data were not included in the final database. 


Teacher survey participation requirements 

Teachers were regarded as respondents if they replied to at least one j 
questionnaire. But again, as was the situation with respect to the students, th 
partial non-response (i.e., omitted items in the questionnaires) was low. 


order to help reduce non-response bias, a school was only regarded as a “p 


t is possible that specific groups of teachers tend to be less likely to partici 


tem in the teacher 
e overall amount of 


pate in a survey. In 
articipating school” 


in the teacher survey if at least 50 percent of its sampled teachers participa 


non-participating. 


f a school failed to follow the survey operation procedures properly, it was 


selection, or if the standard teacher selection procedures had not been followe 
from this particular school were not included in the final database. 


Country-level participation requirements 
Three categories for sampling participation were defined: 


e Countries grouped in Category 1 met the ICILS 2018 sampling participati 


ted. If the response 


rate was lower, teacher data from this school were disregarded and the school was treated as 


classified as a non- 


participating school. For example, if a school failed to list all eligible teachers for the teacher sample 


d, then teacher data 


on requirements. 


e Countries in Category 2 met these requirements only after the inclusion of replacement schools. 


e Countries in Category 3 failed to meet the ICILS 2018 sampling participation requirements. 


Sampling participation categories for the teacher survey were identical to the ones in the student 
survey. The results from ICILS 2018 show that high response rates in the teacher survey were often 


harder to achieve than in the student survey. However, there is no statistical j 


ustification to apply 


different sampling participation standards to the two surveys. Since non-response holds a high 
potential for bias in both parts of the study, the participation requirements in the teacher survey 
were identical to those in the student survey. No participation requirements were determined for 
the reporting of school-level data, however, the participation rate in the school survey was above 
85 percent for all countries that were placed in Category 1 and Category 2 for the student survey. 


The three categories for sampling participation were defined according to the criteria presented 


in Figure 7.1. 
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Figure 7.1: Participation categories in ICILS 


Category 1: Satisfactory sampling participation rate without the use of replacement schools. 

In order to be placed in this category, a country has to have: 

e Anunweighted school response rate without replacement of at least 85 percent (after 
rounding to the nearest whole percent) and an unweighted overall student/teacher response 
rate (after rounding) of at least 85 percent 

or 

e¢ Aweighted school response rate without replacement of at least 85 percent (after rounding 
to the nearest whole percent) and a weighted overall student/teacher response rate (after 
rounding) of at least 85 percent 

or 


e The product of the (unrounded) weighted school response rate without replacement and the 
(unrounded) weighted overall student/teacher response rate of at least 75 percent (after 
rounding to the nearest whole percent). 


Category 2: Satisfactory sampling participation rate only when replacement schools were 
included. 

A country will be placed in this category if: 

fails to meet the requirements for Category 1 but has either an unweighted or weighted 
school response rate without replacement of at least 50 percent (after rounding to the 
nearest percent) 


+ 


and had either 

e Anunweighted school response rate with replacement of at least 85 percent (after rounding 
to the nearest whole percent) AND an unweighted overall student/teacher response rate 
after rounding) of at least 85 percent 


or 
e Aweighted school response rate with replacement of at least 85 percent (after rounding 
to nearest whole percent) AND a weighted overall student/teacher response rate (after 
rounding) of at least 85 percent 


or 


e The product of the (unrounded) weighted school response rate with replacement and the 
(unrounded) weighted overall student/teacher response rate of at least 75 percent (after 
rounding to the nearest whole percent). 


Category 3: Unacceptable sampling response rate even when replacement schools are included. 


Countries that can provide documentation to show that they complied with ICILS sampling 
procedures, but do not meet the requirements for Category 1 or Category 2 will be placed in 
Category 3. 


91 


92 


Reporting data 


The ICILS 2018 research team considered it necessary to make readers of the international 
report aware of the increased potential for bias, regardless of whether such a bias was actually 
introduced. Based on their respective sample participation categories, national survey results 


were reported on as follows: 


e Category 1: Countries in this category appear in the tables and figures in the international 


reports without annotation. 


e Category 2: Countries inthis category are annotated in the tables and figures in the international 


reports. 


IC 
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e Category 3: Countries in this category appear in a separate section of the tables. 


For the student survey, nine countries and both benchmarking par 
meeting sampling participation requirements in Cat 
while six countries and the two benchmarking parti 
survey. Two countries were in Category 2 for the student survey? a 
category for the teacher survey. One country had its student survey re 
section of the tables as a Category 3 country, while for the teacher s 
five countries. All |CILS 2018 countries and benchmarking participan 


cipants were in th 


ticipants were reported as 


egory 1 and were reported without annotation, 


is category for the teacher 
nd one country was in this 
sults reported in aseparate 
urvey this was the case for 


their originally sampled schools above 50 percent. 


Table 7.5: Achieved participation categories by country 


ts had participation rates of 


lable 7.5 lists the participation categories of each country for the student and the teacher surveys. 


Country 


Participation category 


Student survey 


Teacher survey 


Chile 


= 


Denmark 


Finland 


France 


Germany 


taly 


azakhstan 


orea, Republic of 


Luxembourg 


Portugal 


United States 


Uruguay 


N]lolmsefrereJleslejleye]dnrm 


DPW] RP LT WTLRI_RPIT_ RI WI WIRPINIF 


Benchmarking participants 


oscow (Russian Federation) 


orth Rhine-Westphalia (Germany) 


5 Please note that Portugal is reported for the student survey in category two, because they only slightly missed the 


required minimum parcipation rate. 
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ved guidelines on the survey operations procedures for each 


stration of ICILS 2018 assessment depended heavily on the contributions of 
al research coordinators (NRCs) and national center staff. As is the situation 
for all large-scale cross-national surveys, ad 
ogistical aspects of the st 
try. These challenges were heigh 
tinstrum 


h the overall 
ges for each 


ds of administering the 


e developed 


ures to assist the NRCs and to aid uniformity 
t-administration activities. The international team designed these procedures 
participants and the high 


ICILS 2013 


EA's Progress in International 
cs and Science Stu 
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dy (TIMSS), 
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stage of the 


es advised on contacting schools, listing and sampling studen 


ts, preparing 
tion, administering the assessment, scoring the assessment, and creating 
ational centers also received materials on procedures for quality control and were 


asked to complete online questionnaires that asked for feedback on the survey activities. 


Field operations personnel 


The role of the national research coordinators and their centers 


One of the fi 


take when 
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representati 


RCs were 
where necessary, implemented and ad 


national con under the guidance of 


rst steps th 
estab 
t person for 
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in charge of 
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t the international level. 


at all countries or education systems participating in 
r country was to appoint an NRC. The 


all those involved in ICILS 2018 within the country and was 


the overall implementation of the study at the nation 


apted internationally agreed-upon p 


The role of school coordinators and test administrators 


al level. They a 
rocedures for 
the international project staff and national experts. 


CILS 2018 had to 
NRC acted as 


the 


the country 


the 


norder to facilitate successful administration of ICILS 2018, the international team required the 


establishment of two roles within countries: the school coordinator and the test administrator. 
Their work involved preparing for the test administration in schools and carrying out the data 
collection in a standardized way. 
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In cooperation with school principals, national centers identified and trained school coordinators 


for all participating schools. The school coordinator could be a teach 


er or other staff member in 


the school. The school coordinator could also be the test administrator at the school, but was not 


to be ateacher of any of the sampled students. In some cases, nationa 
individuals as school coordinators. The coordinators’ responsibilities | 


tasks: 


Identifying eligible students and teachers belonging to the target po 
center to perform withi 


n-school sampling; 


lcen 
nelu 


ters appointed external 
ded the following major 


pulation to allow the national 


Arranging the date(s) and modalities of the test administration, in particular the delivery method 


of the student 
Distributing in 
they were kep 
Working with 


administer the student 


Ensuring that 


The test admini 
questionnaire. T 


Accordingly, a training session was run by the nati 


test, with 


struments and related materials 
tin asecure place a 


the national center; 


the school principal, 


testing; an 


d 


the test administrators return al 
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nly responsible for ad 
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al at all times; 


nistrator, and 


testing ma 


he national ce 
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Field operations resources 


Manuals and documentation 


teri 
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the affected teachers to plan and 


als after the testing session. 


ministering the student test and 


nter or directly by the schools. 
ntrally or by the schools to make 
un the assessment sessions. 


The international study team released the ICILS 2018 survey operations procedures manuals to 
the NRCs in five units, each of which was accompanied by additional materials, including manuals 


for use in sch 
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Unit 1: Sam 
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ms. 


[he Schoo! Coordinator Manual, subject to translation, described the role a 
the school coordinator, the main contact person within each participating school. 


The National Quality Observer Manual provided national quality con 
formation about ICILS 2018, their role and responsibilities during 
timelines, actions, and procedures to be followed in order to carry out the national quality 


ools and software packages. All of this material was organized and distributed 
y according to the stages of the study. 


and their accompanying manuals and software packages were: 


pling Schools specified the actions and procedures required to develop a national 
nternational ICILS 2018 sample design. 

information about how to work with schools in order to 
he ICILS 2018 instruments. 


escribed the processes involved in preparing the ICILS 2018 
countries. 


ity Monitoring Procedures dealt with the processes involved in 
ring ICILS 2018 data collection in schools. 


it 5: Post Collection Data Capture, Data Upload, Scoring and Parental Occupation Coding provided 
delines on post-data collection processes and tasks. These included, but were not limited 
data capture from the paper questionnaires, uploading student assessment data, scoring 


nd responsibilities of 


The Test Administrator Manual, subject to translation, described the role and responsibilities of 
the test administrator, whose work included administration of the student assessment. 


trol observers with 
the project, and the 
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Th 
in 
tin 
observer programs. 
h 


Th 


e International Quality Observer Manual provided international quality observers with 
formation about ICILS 2018, their role and responsibilities during the project, and the 
nelines, actions, and procedures to be followed in order to carry out the international quality 


e Scoring Guides for Constructed-Response Items, subject to translation, provided detailed and 
plicit guidelines on how to score each constructed-response item. 


e Compatibility Check and School Computer Resources Survey: Instructions for NRCs and Preparing 
Computers for ICILS - Instructions for School Coordinators Manual addressed whether computers 
in the sampled schools could be used for the ICILS 2018 assessment and whether special 


arrangements needed to be made in order to administer the assessment. 


Software 
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used to administer the com 
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nnaires. This 
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E): This software facilitated the entering of paper 


questionnaire data. The IEA DME software also allowed national adaptations to be made to 
the questionnaires and provided a set o 


In addition to preparing the software and 
designed to train national center staff on all procedures and the software supporting these 


procedures. Namely, IEA WinW3s, | 


f data quality control checks. 


EA an 


AM Examiner. This seminar was com 


d RM Results translation systems, | 


manuals, IEA conducted data-manage 


ment training 


EA 


DM 


E, and the 


bined with a scoring training, during which natio 


staff were trained to use the AM Marker. Instructions for using the ICILS translation sys 
covered in one of the regular NRC meetings. 


Field operations processes 


Linking students and teachers to schools 


Every sampled student was assigned an eight-digit | 
number consisted of the four-digit number identifyin 
identifying the student group within the school (01 for all), a 


student within that group. 


Each sampled target-grade teacher was assigned a teacher! 


school number followed by a two-digit teacher number unique within the school. 


Table 8.1: Hierarchical identification code 


5 


nal center 
tems were 


The international project staff established a system to assign hierarchical identification codes (IDs). 
These uniquely identified and allowed tracking of the sampled schools, teachers, and students. 
Table 8.1 represents the hierarchical identification system codes. 


D number unique within each country. Each 
g the school, followed by a two-digit number 
nd a two-digit number identifying the 


D number consisting of the four-digit 


Unit ID components ID structure Numeric example 
School (principal and School (C) CCCC 1001 
ICT coordinator) 
Student School (C), Student Group CCCCGGSS 10010101 
(G, constant: 01), Student (S) 
Teacher School (C), Teacher (T) CCCCTT 100101 


Activities for working with schools 


In ICILS 2018, the within-school sampling process and th 


e assessment administration required 


close cooperation between the national centers and representatives from the schools, that is, the 
school coordinators and test administrators as described previously. Figure 8.1 presents the major 
activities the national centers conducted when working with schools to list and sample students 
and teachers, track respondents, prepare for test administration, and carry out the assessment. 


Contacting schools and within-school sampling procedures 


Once NRCs had obtained a list of the schools sampled for ICILS 2018 (for more information on 


sampling procedures, please refer to Chapter 6 of th 


is report), it was important for the success of 


the study that national centers established good working relationships with the selected schools. 


NRCs were responsible for contacting the schools a 


nd encouraging them to take part in the 


assessment, a process that often involved obtaining support from national or regional educational 
authorities or other stakeholders, depending on the national context. 
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In cooperation with school principals, national centers identified and trained school coordinators 
for all participating schools. The school coordinator could be a teacher or guidance counselor in 
the school. Incases where the school coordinator also acted as the test administrator at the school, 
he or she was not allowed to be a teacher of the sampled class. In some cases, national centers 
appointed one of their own members to fill this role. Often this person was responsible for several 
schools in an area. Each school coordinator was provided with an IC/LS Schoo! Coordinator Manual, 
which described their responsibilities in detail and encouraged them to contact the national center 


if they had any questions. 


School coordinators were required to provide all required information about their respective 
schools and additionally coordinate the date, time, and place of the student assessment. Schoo 
coordinators were also responsible for arranging modalities of the test administration with 
the national center, for example, regarding the use of school or externally provided computers. 
This work required them to complete the school computer resources survey, run the USB-based 
compatibility check, and send results to the national study center. School coordinators were also 
responsible for obtaining parental permission as necessary, liaising with the test administrator to 
coordinate the test session, distributing teacher, school, and ICT coordinator questionnaires, and 
coordinating completion of the student tracking forms and teacher tracking forms. School coordinators 
also ensured that assessment materials were received, kept secure at all times, and returned to 
the national center after the administration. 


National centers sent a student listing form to each school coordinator and asked them to provide 
information on all the eligible target-grade students in the school. School coordinators collected 
details about these students, such as their names (if country regulations allowed names to 
be provided to the national centers), birth month and year, gender, exclusion status,! and the 
assessment language of the student (in case the national center provided different language 
versions of the student instruments). 


The national centers used this information to sample students within the schools. Listing all eligible 
students in the target grade was key to ensuring that every student in the target population had 
a known chance of being sampled, an essential requirement for obtaining random samples from 
all of the target-grade students at and across schools. 


ational centers also sent a teacher listing form to each school coordinator and asked them to provide 
information on all the eligible target-grade teachers within the school. The school coordinators 
listed the eligible target-grade teachers and provided details about these teachers, such as their 
names (if country regulations allowed names to be provided to the national center), birth month 
and year, and gender. The national centers used the collected information to sample teachers 
within the schools. 


1 Although all students enrolled in the target grade were part of the target population, ICILS 2018 recognized that some 
student exclusions were necessary because of a physical or intellectual disability, or in cases of non-native language 
speakers without the language proficiency to complete the assessment. Accordingly, the sampling guidelines allowed 
for the exclusion of students with any of several disabilities (for more information on sampling procedures, please see 
Chapter 6). Countries were required to track and account for all students, yet flagged those for which exemptions were 
defined. Because the local definition of such disabilities could vary from country to country, it was important that the 
conditions under which countries excluded students were carefully documented. 
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Figure 8.1: Activities with schools 


National center Schools 


Track school information 

e Update school information, merge/obtain contact 
information 

e Initialize IEA WinW3S: provide complete database 
information, import school sample database provided 
by IEA, translate and/or adapt survey tracking forms 
e.g., student listing form) 

e Record sampled school’s participation status, use 
replacement if necessary 

¢ Create student listing forms and teacher listing 


orms (printed or electronic) and send to school 
coordinators for completion 


Within-school listing 

e School coordinator lists all in-scope students on 
the student listing form 

e School coordinator lists all in-scope teachers on 
the teacher listing form 


Confirm assessment administration resources and 

method 

e Set up system to record and follow up results of 
school resources surveys and software compatibility 


checks 
e School coordinator sends the completed forms 
back to the national center 
Confirm assessment administration resources 
Sample students and teachers and method 
e Manually enter counts from student listing and/ * School coordinator arranges (or completes) 
or teacher listing forms (number of students and USB based school resources survey and 
teachers), create student and/or teacher records and software compatibility check 


enter information 


OR 
e Import student listing and/or teacher listing forms _s—‘izY 


directly 

e Sample teachers 

¢ Generate teacher tracking forms 

e Sample students (includes assigning instrument 
rotation) 

e Generate student tracking forms (paper and/or 
electronic) 

e Print instrument labels for teacher, principal, and ICT 


coordinator questionnaires 
e Send tracking forms and labeled survey instruments 
to schools 


Confirm assessment administration resources and Assessment administration 


method e Test administrators track student participation on 
¢ Confirm assessment administration process for each student tracking forms 
participating school based on information from the * School coordinators track teacher participation 
school on teacher tracking forms 


e School coordinators/test administrators send the 
completed forms back to the national center 


Track student and teacher participation status 

e Import/enter student participation information from 
student tracking forms 

e Import/enter teacher participation information from 
teacher tracking forms 

e Import student participation data availability status 
from test administration system 

e Import online questionnaire data availability status from 
the IEA OSS Monitor 
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Preparing the computer-based test delivery at schools 


Because ICILS 2018 was a computer-based assessment, it was necessary to test the computer 
resources available at participating schools to ascertain whether the school computer resources 
could be used to deliver the assessment. 


The compatibility check and school computer resources survey were administered in order to 
answer two questions: (i) if school computers could be used for the testing or if schools would 
need to be provided with computers able to do this task; and (ii) if, in those cases where the school 
computers could be used for testing, special arrangements would be needed (e.g., altering the 
configuration of computers or using a laptop with the local server connected to the school LAN) 
for the USB-based student test to run correctly. 


The process of administering the compatibility check and school resources survey required NRCs 
in non-English speaking countries to translate the school computer resources survey questions 
and to make them available along with the USB compatibility check file to school coordinators on 
a USB stick. 


After receiving the USB sticks containing the compatibility check files and instructions, school 
coordinators were required to: 


e Runthe USB compatibility check on every computer that was to be used for the ICILS 2018 
assessment; 


e Complete one of the included school computer resources surveys per school; and 


e Send the results back to the national study center. 


This information on the availability and compatibility of the participating schools’ computers 
enabled national centers to determine the best test delivery method for each school. 


[he national centers then sent the following items to each school: the necessary tracking forms, 
labels, questionnaires (online or paper-based), and manuals as well as USB sticks matching the 
number of students listed on the student tracking form (plus three extra sticks). 


Administering the assessment at schools 


he process of distributing the printed materials and the electronic student instruments to the 
schools required the national centers to engage in careful organization and planning. 


The national centers sent teacher questionnaires to each teacher listed on the teacher tracking 
form, in each school. They also sent a principal questionnaire to each school’s principal and an ICT 
coordinator questionnaire to each school’s ICT coordinator. 


The national centers furthermore prepared and sent cover letters containing login information 
and instructions on how to complete the online questionnaire to all teachers, school principals, 
and ICT coordinators who had elected to complete their questionnaires online. National center 
staff sent the packaged materials to the school coordinators prior to the testing date and asked 
them to confirm the receipt of all instruments. School coordinators then distributed the school 
questionnaire and teacher questionnaires (or the cover letters for the online participants) while 
ensuring that the other instruments were kept in a secure room until the assessment date. 


In accordance with the international guidelines and requirements as well as local conditions, 
national centers assigned atest administrator to each school. 1n some cases, the school coordinator 
also acted as the test administrator. The test administrators received training from the national 
centers. Their responsibilities included: running a pretest administration on the day of testing in 
order to confirm that the student computers were prepared for the test; distributing materials 
to the appropriate students; logging in and initializing the test on the computers (either via the 
USB sticks provided by the national centers or the server method): leading students through the 
assessment; and, accurately timing the sessions. 
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The student tracking forms indicated, for each sampled student, the assigned student instrument, 
which consisted of the two test-item modules and the student questionnaire, administered via 
the ICILS 2018 Student Test Software. For ICILS 2018 countries administering computer and 
information literacy (CIL) test modules only (i.e., Chile, Italy, Kazakhstan, Uruguay, and Moscow, 
Russian Federation), administration of the assessment consisted of three parts, the first two 
of which required students to complete CIL test modules and the third to answer the student 
questionnaire. In countries participating in the computational thinking (CT) test modules (i.e., 
Denmark, Finland, France, Germany, Korea, Luxembourg, Portugal, the United States, and North 
Rhine-Westphalia, Germany), the test session consisted of five parts with both CT test modules 
following the two CIL test modules and student questionnaire parts. Test administrators were 
requested to document student participation on the student tracking forms. 


During the administration of the assessment test, administrators were required to provide arange 
of instructions to students. When administering some parts of the assessment test, administrators 
were asked to read instructions to the students as provided to them in the Test Administrator 
Manual. Administrators had to read the text to the students exactly as it appeared in the script. In 
some other parts of the assessment test, administrators were required to read instructions from 
a script but had the option of modifying or adapt 


ng it to best suit a given situation. 


In these instances, it was essential that the exact contents and meaning of each of the scripts was 
conveyed to each set of students. The only instances in which test administrators could use their 
own words was when the test administrator manual did not include a script for the instructions, for 
example, when the manual explicitly advised administrators that they could answer any questions 
or points of clarification. 


The time allotted for each part of the student testing and questionnaire administration was 
standardized across countries. In all countries, target-grade students were allowed 30 minutes to 
complete each of the two modules (60 minutes in total). Students who completed the assessment 
before the allotted time was over, were allowed to review answers or read quietly, but were not 
allowed to leave the session. Students were given around 25 minutes to complete the student 
questionnaire and were allowed to continue if they needed additional time. Test administrators were 
required to document the starting and ending time of each part of the assessment administration 
onthe test administration form. 


Incountries administering the two CT modules, after the regular CIL test and student questionnaire 
sessions (as described above), the test session was extended by an additional 25 minutes per each 
of the two CT test modules. Table 8.2 details the time allotted to the different parts of the student 
assessment. 


Once the administration was completed, the school coordinators were responsible for collecting 
and returning all materials to their respective national center. 


Online data collection of school principal, ICT coordinator, and teacher questionnaires 


As in the previous cycle, ICILS 2018 offered participating countries the option of administering 
the principal, ICT coordinator, and teacher questionnaires online instead of in paper form. To 
ensure comparability of the data from the online and the paper modes, only those countries that 
had previously tested the online data collection during the ICILS 2018 field trial were allowed to 
use the online option during the main survey. All countries used the online administration mode 
for their schools. 


After the principal, ICT coordinator, and teacher questionnaires had gone through the translation 
and translation verification processes, they were prepared for delivery online using the IEA Online 
Survey System (IEA OSS) software as described in more detail in Chapter 5 of this report. 
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Table 8.2: Timing of the ICILS assessment 


Activities 


Length 


Preparation of students, reading of instructions, and administering 


the tutoria 


20 minutes (approx.) 


Administering the CIL student assessment—first module 


30 minutes (exact) 


Short brea 


5 minutes (max.) 


Administering the CIL student assessment—second module 


30 minutes (exact) 


Short brea 


5 minutes (max.) 


Administering the stude 


nt questionnaire 


20 minutes (approx.) 


Longer break (CT only) 


Between 15 and 45 minutes 


Administering the CT student assessment—first module (CT only) 


25 minutes (exact) 


Administering the CT student assessment—second module (CT only) 


25 minutes (exact) 


Collecting the assessme 


nt materials and ending the session 


5 minutes (approx.) 


TOTAL (CIL only) 


2 hours (approx.) 


TOTAL (including CT) 


3.5 hours (approx.) 


The IEA OSS is a hierarchica 
information, including text pa 


and information for data management. 


[he Designer compone 
questions and categori 


web server to verify an 
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centers had to ensure that every respondent assigned to the online mode by default had the option 
to request and complete a paper questionnaire, regardless of the reasons for being unwilling or 
unable to answer online. 


To ensure confidentiality and separation, every respondent received individual login information. 
The national centers sent this information, along with general information on how to access the 
online questionnaire, to respondents in the form of “cover letters.” In line with the procedures used 
during distribution of the paper questionnaires, the school coordinator delivered this information 
to the designated individuals. 


During the administration period, respondents could log in and out as many times as they needed 
and could resume answering the questionnaire at the question they had last responded to in 
their previous session. Answers were automatically saved whenever respondents moved to 
nother question, and respondents could change any answer at any time before completing the 
questionnaire. During the administration, the national center was available for support; the center, 
in turn, could contact IEA if unable to solve a problem locally. 


ied) 


The navigational structure of the online questionnaire had to be as similar as possible to that of 
the paper questionnaires. Respondents could use “next” and “previous” buttons to navigate to an 
adjacent page, as if they were flipping physical pages. In addition, a hypertext “table of contents” 
mirrored the experience of opening a specific page or question of a paper questionnaire. While most 
respondents followed the sequence of questions directly, these two features allowed respondents 
to skip or omit questions, just as if they were answering a self-administered paper questionnaire. 


To further ensure the similarity of the two sets of questionnaires, responses to the online 
questionnaires were not made mandatory, evaluated, or enforced in detail (e.g., using hard 
validations). Instead, some questions used soft validation, such as respondents being asked to give 
numerical responses to questions that had a minimum and maximum value—for example, the total 
number of students enrolled in a school. In some instances, respondents’ answers to this type of 
question led to the response being updated according to the individual respondent’s entries, even 
if that response was outside the minimum or maximum value, but with the caveat that the response 
still needed to be within the specified width. 


Certain differences in the representation of the two modes remained, however. To reduce response 
burden and complexity, the online survey automatically skipped questions not applicable to the 
respondent, in contrast to the paper questionnaire, which instructed respondents to proceed 
to the next applicable question. Rather than presenting multiple questions per page, the online 
questionnaire proceeded question by question. 


While vertical scrolling was required for a few questions, particularly the longer questions with 
multiple “yes/no” or Likert-type items, horizontal scrolling was not. Because respondents could 
easily estimate through visual cues the length and burden of a paper questionnaire, the online 
questionnaires attempted to offer this feature through progress counters and a “table of contents” 
that listed each question and its response status. Multiple-choice questions were implemented 
with standard HTML radio buttons. 


Because the national centers were able to monitor the responses to the online questionnaires 
in real-time, they could send reminders to those schools where people had not responded in the 
expected period of time. Typically, in these cases, the centers asked the school coordinators to 
follow up with those individuals who had not responded. 


Although countries using the online mode in ICILS 2018 faced parallel workload and complexity 
before and during the data collection, they had the benefit of a reduction in workload afterwards. 
Because answers to online questionnaires were already in electronic format and stored on servers 
maintained by IEA, there was no need for separate data entry. 
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Online data collection for survey activities questionnaires 


Inorder to collect feedback about survey operations from NRCs, the international project team set 
up asurvey activities questionnaire online. The questionnaire was prepared and administered using 
the IEA OSS. As the survey activities questionnaire, unlike the other ICILS 2018 questionnaires, 
did not require national adaptations and was completed in English, it was well suited for online 
data collection. 


The purpose of the survey activities questionnaire was to gather opinions and information about 
the strengths and weaknesses of the ICILS 2018 assessment materials (e.g., test instruments, 
manuals, scoring guides, and software) as well as countries’ experiences with the ICILS 2018 survey 
operations procedures. NRCs were asked to complete these questionnaires with the assistance of 
their data managers and the rest of the national center staff. The information was used to evaluate 
survey operations. It is also being used to improve the quality of survey activities and materials 
used in future ICILS cycles. 


IEA sent the NRCs individual login information and internet links for accessing the online 
questionnaires. Before submitting the responses to IEA, NRCs could go back and change their 
answers if necessary. 


Scoring the assessment and checking scorer reliability 


Scoring the assessment 
The success of assessments containing constructed-response items depends on the degree to 
which student responses are scored reliably. Seventeen of the ICILS 2018 CIL assessment items 
were constructed-response items, and five large tasks were scored against a total of 37 criteria. 
One of the large-task criteria was automatically scored by the ICILS scoring system. Human scorers 
reviewed the automatically generated suggested score and could either accept or modify the 
score. Of the 102 ICILS 2018 CIL items, 53 were scored by human scorers, and it was critical to 
the quality of the ICILS 2018 results that these tasks were scored in a reliable manner. Reliability 
was accomplished by providing national centers with explicit scoring guides, extensive training 
of scoring staff, and continuous monitoring of the quality of the work during scoring procedures. 
There were 17 CT items for ICILS 2018. Two of these items were constructed response and 
scored by human scorers. 


During the scoring training, which was conducted at the international level, national center staff 
members learned how to score the constructed-response items and to use the scoring criteria 
for the large-task items in the ICILS 2018 assessment. Scoring training took place before both 
the field trial and the main survey. The training that took place prior to the field trial provided the 
participants with their first opportunity to give extensive feedback on the scoring guides, which 
were then revised on the basis of this feedback. The training conducted before the main survey 
enabled national center staff to give additional feedback on the scoring guides, with that feedbac 
based on their experiences of scoring the field-trial items. The scoring guides for the three ICILS 
trend modules were not revised and were identical to those used in the first ICILS cycle. Further 
details of the development and revision of the ICILS 2018 main survey scoring guide for open- 
ended response items are provided in Chapter 2. 


The main survey scorer training employed asample set of student responses collected during the 
field trial in English-speaking ICILS 2018 countries. The example responses used during scorer 
training were a mixture of those that clearly represented the scoring categories and those that 
were relatively difficult to score because they were partially ambiguous, unusually expressed, or on 
the “borderlines” of scoring categories. The scores that national center staff gave to these example 
responses were shared with the group, with discussion focusing on discrepancies in particular. The 
scoring guides and practice responses were refined following the scoring training to clarify areas 
of uncertainty identified during the scorer training. 


ICILS 2018 TECHNICAL REPORT 


Once training had been completed, the ISC provided national centers with a final set of scored 
sample responses as well as the final version of the scoring guide. The scored sample responses 


were access 


English. Nati 


to apply the 
national cen 


To prepare 
to organize 


ible electronically through the web-based scoring system and were available only in 
onal centers used this information, as they saw fit, to train their scoring staff on how 


scoring guides to the constructed-response items and large tasks. In some cases, 
ters created their own sets of example responses from the student responses collected 
in their country. 


for this task, the ISC provided national centers not only with suggestions on how 
staff, but also with materials, procedures, and details on the scoring process. The 


ISC encouraged the national centers to hire scorers who were attentive to detail, familiar with 
education, and who, to the greatest extent possible, had a background in CIL. The ISC also provided 


guidelines o 


Documenting scoring reliability 


Documenting the reliability of t 
of monitoring and maintaining t 


n how to train scorers to accurately and reliably score the items and tasks. 


he scoring process within countries was a highly important aspect 
he quality of the ICILS 2018 scored data. Scoring reliability within 


each country required two different scorers to independently score arandom sample of 20 percent 


of responses for each construc 


The selection of responses to b 
were random and managed by t 
that arandom selection of 20 p 


ted-response item and each large task. 


e double-scored and the allocation of these responses to scorers 
he web-based scoring software. The software was set up to ensure 
ercent of all responses was double-scored, and that scoring could 


begin before all student responses had been uploaded to the system (thus allowing for late returns 
of data from some schools). The software set-up also allowed these tasks to be accomplished 


Ww 


The degree o 
lity of 


of the reliabi 
inter-rater r 


additional tr 


thout com 


this informationt 
use the informati 


ow inter-rater 


tems with relati 


eliab 


f agreement between the 
the scoring process. The we 
g leaders who were encouraged (but not required) to use 


ility reports to scorin 
ohelpthem monitor t 


on to monitor the agreeme 
scorers whose agreement was low re 
reliability that might need 


scores 


he qua 


promising the selection probability of each piece of work for double scoring. 


,as assigned by the two scorers, provided a measure 
b-based scoring system was able to provide real-time 


ity of the scoring. Scoring leaders could, for example, 


ainin 


g to improve the qua 


vely low inter-rater reliabili 
of student achievement for that coun 
to inter-rater reliability. 


Field trial procedures 


served the p 
possible pro 


under condi 


refine their 


The field trial was cr 


The operational reso 
tions ap 
This process also al 


blems d 


nt of each scorer with their colleagues (and identify 
ative to others), or identify items or tasks with relatively 
to be rescored or to have scorers provided with some 
ity of their scoring. 


ty within a given country were not used in the estimation 
try. Chapter 11 outlines the adjudication process relating 


The ICILS 2018 field trial was a smaller administration of the |CILS 2018 assessment; on average, 
approximately 1000 students were tested in each participating country. 


The international field trial was conducted from May to June 2017. 


ucial to the development of the ICILS 2018 assessment instruments and also 
urpose of testing the ICILS 2018 survey operations procedures in order to avoid any 
uring the ICILS 2018 data collection. 


urces and procedures described in this chapter were used during the field trial 
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collection procedures. The field trial resulted in some important modifications to survey operations 
procedures and contributed significantly to the successful implementation of ICILS 2018. 
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Summary 


Considerable efforts were made to ensure high standards of quality in the survey procedures for 
the ICILS 2018 data collection. NRCs played a key role in implementing the data collection in each 
participating country, during which they followed internationally agreed upon survey operations 
procedures. The international study consortium provided NRCs with a comprehensive set of 
manuals containing detailed guidelines for the preparation of the study, its administration, scoring 
of open-ended questions, and data processing. National centers also received tailored software 
packages for sampling and tracking student and teachers within schools, the computer-based 
student assessment, data capture, and the online administration of contextual questionnaires. 
The international ICILS 2018 field trial in 2016-2017 was crucial for testing survey operations 
procedures in participating countries and contributed to the successful implementation of the 
main data collection. 
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Selecting and training international quality observers (IQOs) 
|QOs are independent experts who have experience with school environments and ideally the ICT 
context, reside in the participating country, and have no affiliation with the national study centers. In 
order to facilitate the recruitment process, IEAasked NRCs to nominate two candidates to serve as 
IQOs for ICILS 2018 in their respective countries, which included sending the candidates’ CVs and 
ashort explanation why they think the nominees would be suitable for the position to IEA. Potentia 
candidates had to be familiar with school environments or the day-to-day operations of schools 
and needed to be ICT literate. Additionally, they needed to be fluent in both the administered 
language(s) in the target country and English. |QOs should also not have any professional or 
personal affiliation to the national study center. Potential candidates were, for instance, schoo 
inspectors, relevant ministry officials, retired school teachers, or school principals. After carefully 
reviewing the CVs and considering the NRC’s recommendation, IEA selected the most suitable 
candidate for the |QO position in every country. 


Following the selection process, IEA invited the selected candidates to Amsterdam for aone anda 
half day in-person training. During the training, they were familiarized with the content of the study, 
the general procedures, and their tasks and responsibilities as |QOs. They were also informed about 
the possibility of hiring |QO assistants to reduce their workload. This was particularly advised in 
the case of large countries in order to cover different regions, states, or territories of the country 
or in case of a very short data collection window. If |QOs decided to appoint an assistant, they 
were solely responsible for their training, communication, payment, and any other organizational 
tasks. However, |QOs needed to inform IEA about their assistants and were required to submit 
confidentiality agreements signed by the assistants. In addition to the training, |QOs received the 
following materials, which they needed to accustom themselves with the ICILS 2018 procedures, 
their responsibilities, and to complete their documentation: 


e International Quality Observer Manual 


e International versions of the principal, teacher, and ICT coordinator questionnaires 
e National adaptation form (NAF) template 


e International version of the School Coordinator Manual 


O 
e |nternational version of the Test Administrator Manual 


e School visit travel form 


e School visit tracking form 


e Administration observation record 


e Translation verification report 


e Checklist for collecting materials from the NRC 


e Confidentiality agreement to be signed by IQO assistants 


In addition to the information provided to |QOs during the training seminar, the above-listed 
materials contained all necessary instructions needed to serve as |QO for ICILS 2018. After the 
training seminar and during the |QOs’ fieldwork, IEA remained in close contact with the |QOs 
and provided further assistance and instructions if required, especially in case of unforeseen 
circumstances. 
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Overview of IQOs’ responsibilities 
The |QOs tasks and responsibilities were outlined and described in great detail in the |QO manual 
and conveyed during the training seminar. Their main responsibility was to monitor the data 
collection activities in their countries, document their findings, and report them to IEA. More 
precisely, |QOs’ tasks covered the following: 


e Familiarizing themselves with the ICILS content and context 


e Visiting the NRC and collecting national instruments and other materials 


e Selecting 15 schools for international quality contro 


e Contacting the schools selected for international quality control and arranging visits 


e Observing an ICILS testing session in each selected school 


e Interviewing the school coordinator and test administrator of each selected school 


e Completing the translation verification report 


> 


e Completing the documentation and reporting to IE 


Visiting the national research coordinator (NRC) 


Before the start of the data collection period for the ICILS 2018 main survey, |QOs were required 
to visit the NRC in their country. During this visit, |QOs had to select a sub-sample of schools for 
the observation of the student assessments in collaboration with the NRC. In addition, they had 
to collect national materials that were needed to conduct the school visits and a set of the national 
study instruments. NRCs provided |QOs with the following materials: 


e USB flash drive containing the national version(s) of the student test and questionnaire 


e Printed or digital version of the ICT coordinator, principal, and teacher questionnaire 


e Printed copies of the national version(s) of the School Coordinator Manual for each administered 
language 


e Printed copies of the national version(s) of the Test Administrator Manual for each administered 
language 


e Student listing and tracking forms for each selected school 


e Teacher listing and tracking forms for each selected school 


e Contact information for the selected schools 


The listed forms and documents were needed as areference during the testing session observations, 
for the interviews with the school coordinators and test administrators, and to complete the |QOs’ 
documentation. After |QOs completed their tasks and fulfilled their contract, they were asked to 
send all materials that they received from the NRC to IEA. 


The task for determining the subsample of schools for international quality control included 
selecting 15 schools, plus three extra schools as replacements in case the IQO had difficulties 
contacting the initially selected schools or other unforeseen circumstances. Ideally and to the 
extent possible, the sampled schools were chosen following a random selection process. However, 
arandom selection could be subject to a number of practical constraints. Therefore, |QOs were 
allowed to exclude schools outside of a reachable driving distance from where they or their 
assistants were residing. Another practical constraint was that the school should not already 
be selected for participation in the national quality control program (see the Survey activities 
questionnaire section below for more information about this program). Despite these constraints, 
QOs were asked to attempt to visit schools in different areas or regions of their country in order 
to ensure a decent geographical coverage. In the case of very large countries, |QOs were highly 
advised to appoint assistants for this purpose. Following the selection of the schools to be visited 
for international quality control, |QOs needed to send their selection to IEA for approval. For this 
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notes in an online version of the administration observation record after every school visit. The 
online version of the administration observation record was accessible through the IEA Online 
Survey System and all information needed to be entered in English. 
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While observing, |QOs were asked to use the administration observation record to structure their 
observation of the testing session and document their findings. The administration observation 


record included multiple choice and open ended questions, and consisted of four different content 
areas: (a) the ICILS administration arrangements and settings; (b) the ICILS administration process; 
(c) summary observations and general impressions of the |CILS administration; and (d) the interview 
with the school coordinator and test administrator. The following section presents the data derived 


from these administration observati 


on 


ICILS administration arrangement 


The first section of the administration observation record asked about the assessment conditions 
and general preparations regarding the ICILS administration. It covered the setup of the room 
where the session took place and the arrangement of assessment materials. According to the 
answers in the administration observation record, around two thirds of the |QOs (67%) met the 
school coordinator prior to the preparation of the assessment room upon arriving at the schools. 
The preparation of the assessment room began about 60 minutes before the testing session started 
in around half of the schools (49%) and in approximately a third of the cases (36%) even earlier. 
IQOs also reported in most of the cases (91%) that the testing materials were safely stored and 


securely sealed when they arrived at the school. 


Generally, the testing rooms seemed in order and well prepared in the majority of the visited 
schools. Table 9.1 shows detailed answers about the set-up of the testing rooms and preparations, 
which includes the seating space for students, the set-up of the work stations, and preparatory 


actions taken by the test administrator. 
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Table 9.1: Preparation of the testing room 


Question Response category (%) 
Yes No Missing 

Did the test administrator receive an adequate supply of USB 99 il - 
flash drives? (if applicable) 

Is there adequate seating space for students to avoid unwanted 95 5 - 
distractions while testing is in progress? 

s there adequate space for the test administrator to move 98 2 - 
around the room while the testing is in progress? 

Does the test administrator have a watch or use another type of OT 3 - 
device to keep track of time while the testing is in progress? 

Did the test administrator set up all workstations with each of 81 19 - 
them displaying the welcome screen prior to the students’ arrival? 

s the test administrator ensuring that each student is sitting in 93 6 il 


ront of the computer specially prepared for him or her? 


Note: Percentages derived from a total of 195 responses. 
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Script type Response category (%) 
No Minor Major Missing 
changes | changes | changes 
ICILS assessment 68 28 4 - 
Computational thinking modules* 76 23 1 - 
Student questionnaire 76 20 3 2 


Notes: Percentages derived from a total of 195 responses. Because results are rounded to the nearest whole 
number, some totals may appear inconsistent. 
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t countries 


Concerning the computers or tablets used for the ICILS administration, |QOs reported that in 
almost all cases (92%) the modules being displayed on the screens during the ICILS assessment 
the information listed on the student tracking form for each student in the 
room. In almost all observed testing sessions (99%) exiting of the testing software of the student 
ly and the screens reverted back to the initial login screen. 
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As for students’ compliance with the allocated time for the administration, |QOs reported that the 
majority of the students (92%) stopped working immediately after the allowable time period for the 
ICILS assessment ended. For countries that administered the computational thinking (CT) module, 
the majority of the students (96%) stopped working immediately after the allocated time period 
for the CT modules ended as well. In nine percent of the cases, there were students in the testing 
session who completed the assessment before the end of the allowable time period. Regarding the 
administration of the student questionnaire, approximately a third of the students (32%) asked for 
additional time to complete the questionnaire. In these cases students were given between one and 
30 additional minutes to complete the student questionnaire (on average around seven minutes). 


General impressions of the |QO 


The third section of the administration observation record was dedicated to general impressions 
and observations of the |QO during the ICILS administration. In general, |QOs described the 
overall quality of the ICILS administration in about half of the cases (46%) as excellent, in around 
a third of the cases (35%) as very good, and in around an eighth of the cases (13%) as good. A few 
sessions were considered only of fair (5%) or poor (1%) quality. Most |QOs were also under the 
impression that test administrators familiarized themselves with the procedures and scripts prior 
to the test administration (95%). 


Concerning the technical infrastructure at the schools, |QOs did not observe many technical 
issues (see Table 9.3). In some cases, problems when logging in, using the USB flash drives, or other 
malfunctions when using the ICILS assessment delivery system occurred. Other malfunctions 
included appearing error messages, sudden interruption or locking of amodule, or frozen screens. 
QOs reported on the necessity to change tablets, replace USB flash drives, or to restart computers 
insome testing sessions to resolve the problems that occurred. |QOs considered the actions taken 
by the test administrators to solve problems as efficient most of the time (95%). 


Table 9.3: Technical problems experienced with procedures or devices 


Procedures/devices Response category (%) 

Yes No Missing 
Logging in 13 87 - 
Using USB flash drives al} G2 15 
Timer function 2 97 1 
Other malfunctions when using the delivery system Pall 78 1 
Keyboard (switching languages) 3 94 3 


Note: Percentages derived from a total of 195 responses. 


Further, |QOs were under the impression that test administrators recorded students’ attendance 
onthe student tracking form correctly (100%). In total, there were only afewsessions with students 
who refused to take the assessment either prior to or during the administration (5%). |QOs reported 
that in about three quarters of the observed sessions (76%) students did not have particular 
problems with the ICILS administration. If there were issues, |QOs reported about students getting 
tired or losing concentration due to the length of the assessment. Furthermore, it appears that some 
students had difficulty understanding certain exercises. Overall, |QOs considered the students in 
around two thirds of the observed testing sessions as extremely orderly and cooperative (68%) 
and ina bit less than a third of the sessions (28%) as at least moderately orderly and cooperative. 
Inmost of the sessions (87%) |QOs did not observe students attempting to cheat or students who 
did not pay attention. In the case of non-cooperative and disorderly students, |QOs reported that 
most of the test administrators made an effort to control the students and the situation (84%). 
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Interview with school coordinators and test administrators 


The final section of the administration observation record was dedicated to interviews with the 
school coordinator and test administrator. The purpose of these interviews was to obtain their 
evaluation of the |CILS administration, gather suggestions for improvement, and acquire additional 
background information. 


School coordinators were mainly internal school staff. Fifty-one percent were school principals 
or other members of school management, 37 percent were teachers in the school where the 
assessment took place, and nine percent were other school staff members. In contrast, when 
it comes to test administrators, about a third of them were internal to the school where the 
assessment took place and about two thirds were external. More precisely, they were principals 
(8%), teachers of ICT (10%), teachers of another subject (13%), other school staff (6%), ICILS 
national center staff (17%), or they had an external position (46%). 


When asking about the overall quality of the ICILS administration, 77 percent of all school 
coordinators said that it went very well and without problems and 16 percent answered that it went 
satisfactory and only with a few problems. If there were problems, school coordinators sometimes 
described difficulties with organizing and arranging the administration. In some schools, there were 
technical issues or problems with using technology. School coordinators rated the general attitude 
of other school staff members towards the ICILS assessment as positive (58%) or neutral (32%) 
inmost of the cases. Furthermore, school coordinators reported to |QOs that approximately half 
of the students (54%) received some sort of special instructions, motivational talks, or incentives 
to prepare them for the assessment. 


With regard to the School Coordinator Manual, the majority of the school coordinators (89%) felt 
that the manual worked well and did not need improvement. However, some suggestions for 
improvement were given and included shortening the manual, a better structure of the manual, or 
clearer instructions in certain areas. In addition to the Schoo! Coordinator Manual, around half of the 
school coordinators (47%) received some sort of additional training, instructions, motivational talks, 
or incentives from the national centers. Regarding the Test Administrator Manual, three quarters of 
the test administrators (75%) felt the manual worked well and did not require improvement. Some 
test administrators suggested improvements that included more specific explanations for certain 
aspects of the assessments, clearer instructions, or reducing redundancy. 


QOs asked school coordinators about various forms used during the ICILS administration and 
if the information on these forms was correct. Table 9.4 shows that teacher listing forms and 
student listing forms contained correct information in the majority of the schools, and that the 
teacher questionnaires or cover letters were distributed according to the teacher tracking form 
in almost all cases. 


Table 9.4: Teacher and student listing forms and teacher tracking forms used for the assessment 


Question Response category (%) 
Yes No Missing 
Did the teacher listing form include all eligible teachers as listed 93 5 2 


in the school timetable for the target grade? 


Were all students participating in the ICILS assessment listed on Os} 4 3 
the student listing form? 


Were the teacher questionnaires/cover letters distributed 85 9 7 
according to the teacher tracking form? 


Notes: Percentages derived from a total of 195 responses. Because results are rounded to the nearest whole 
number, some totals may appear inconsistent. 
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Translation verification report 


ICILS 2018 TECHNICAL REPORT 


All national study instruments underwent a thorough translation verification process before 
ministration (see Chapter 5 for more information). The purpose of translation verification 
was to ensure that the language rema 
international source version of the instruments. Upon completion of translation verification, the 
instruments containing translation verifier comments and suggestions were released back to 
the NRC. The NRC reviewed the verifier’s feedback and had the final decision on whether or not 
adopt the recommendations. The consortium requested all NRCs to respond to the verifier’s 
feedback and to document whether they agree or disagree with the verifier’s suggestion or if they 
want to modify the translation further. 
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Table 9.5: School coordinator information 
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Question 


Response category (%) 


How did you train the school coordinators? 


Yes 


Formal training session 


Through telephone, email, or video-link 


Written instructions 


Other 


5 
4 
8 
> 


Question 


Yes No 


Did the school coordinator report difficulties with understanding any of the following aspects of 
ICILS administration?* 


Identifying eligible teachers and/or students 


The necessity for listing all target grade teachers 


The necessity for listing all target grade students 


4 


agging students to be excluded prior to school sampling 
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assrooms 


The rationale for sampling students from the target grade across 


The process for running the USB compatibility tests in schools 


administration 


The steps to set up computers in schools to support successful test 3 9 


The test administration procedures 


Notes: Percentages derived from a total of 13 responses. 
* One country did not answer this question 
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for administration. A few countries (4) reported having difficulties during the layout verification 
process of the ICILS delivery system for the student instruments. These difficulties were mainly due 
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to the functionality and usage of the system. The majority of the countries (12) did not report any 
major problems when it came to the verification process for the online questionnaires. Most of the 
countries did not experience major difficulties when downloading and replicating the ICILS delivery 
system to the USB flash drives, which included downloading the test file image (13), extracting 
he test file to a USB (12), and creating copies of the USB flash drives for use in the testing (11). 


t 


Administering ICILS 
Some countries (4) reported problems concerning the web-application that occurred during the 
administration of the online questionnaires. Additionally, five countries observed some issues with 
he login procedure during the administration of the online questionnaires. These included issues 
with accessing the online questionnaire or that respondent identification (ID) and passwords 
did not work properly. Problems during the administration of the ICILS assessment included 
malfunctions of the delivery system (e.g., freezing of the screen), which in some cases led to the 


t 


rebooting of the delivery system or other technical issues (e.g., corruption of USB flash drives, 


issues with the firewall). 


Concerning the participation of the ICILS populations, some national study centers experienced 


di 


fficulties in achieving high participation rates. Six countries reported problems with student 


participation, mainly due to difficulties with receiving parents’ consent or student refusal. Eight 
countries experienced problems with teachers’ participation. Reasons included teachers’ workload, 
lacking motivation, or missing support of teacher unions. Four countries reported problems with 
ICT coordinators’ and principals’ participation rates. 


Scoring open-ended response items 


Scoring activities for open-ended response items in the participating countries were mainly 
performed by national center staff (3), teachers or professional educators (9), or university students 
5). On average, 12 scorers were used per country. A few countries (5) reported that their scorers 
had difficulties using the scoring system for training and scoring the student work, which included 
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problems with acce 


ssing the software, speed of the application, or displaying pictures appropriately. 


Further, six countries reported difficulties with the reports for reliability scoring. 


Entering and coding occupation data 


For coding occupations, most countries used mainly their own staff members (9). In addition, a 
ew countries used experts from external organizations (3) or university students (1). On average, 
countries used four coders for coding of the occupation data. Most of the national study centers 
12) made use of the International Standard Classification of Occupations (ISCO-08) scheme for 
coding occupations and used either existing translations or their own translations of the scheme. 
None of the countries reported problems with converting the national coding scheme to ISCO-08. 


f 
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Seven coun 


about working with the IEA Coding Expert software. 


Entering data manually and submitting data 


Entering data manually was only applicable for three cou 
he teacher, principal, and ICT coordinator questionnaires solely online. One of these three 
countries used the IEA Data Management Expert (IEA 
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study center. 


tries reported having difficulties with coding the student responses and one country 


ntries, as all other countries administered 


DME) software to enter the derived data 


rom the questionnaires. 


National quality control activities 


n addition to the international quality control program, quality control at the national level also 
ng the data collection. National study centers were responsible for conducting a 
ational quality control program, for which the composition was at the discretion of the national 
However, the consortium provided recommendations and guidelines in the form 


of a National Quality Observer Manual to NRCs. The recommendations included recruiting and 
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training NQOs, visiting 10 percent (or aminimum of 15) of the sampled schools to observe a testing 
session, and to interview the school coordinator and test administrator in each school. The manual 
included instructions for NQOs anda template ICILS administration observation record. However, 
countries could decide whether they would use the manual and could adjust it where necessary. 


All 13 participants reported that they conducted national quality control. However, specific 
national quality control activities varied from country to country. Each country appointed one or 
more NQOs, who were mainly staff members of the national center, external experts, or schoo 
inspectors. NQOs were trained in multiple ways including formal training sessions (8), instructions 
viatelephone, email, or videos (2), and written instructions (6). A minimum of six schools was visited 
in the course of the national quality control program in each country. Two countries reported 
conducting some sort of national quality control activities in all sampled schools, which included, 
for example, regular check-in calls with the test administrators. In one of these two countries, every 
testing session was observed on-site by a staff member of the national study center. Most nationa 
centers (12) reported that the visited schools were located in different regions of the country. The 
majority of the countries (11) made use of at least parts of the National Quality Observer Manua 
provided by the consortium. Adjustments to the manual included reducing the length of the 
document or adapting it to the national context. The manual was also used as a basis for producing 
training materials. Issues reported by NQOs included technical issues such as freezing or crashing 
of the software or making use of the USB compatibility check. NQOs further reported on minor 
procedural deviations, which included non-adherence to the allocated time for the testing session. 
They also provided feedback regarding the teacher and student tracking forms used during the 
administration and made suggestions for improvements. 
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Summary 


The ICILS consortium, in cooperation with the participating countries, developed and coordinated 
range of quality assurance measures in order to monitor the quality of the study administration. 
his chapter focused on two quality assurance activities that occurred during the main survey 
for ICILS 2018, namely the international quality control program and the SAQ. The information 
presented focused on the structure and general approach of these two components as well as the 
most important results. 


ied) 


The |QOs were appointed to observe 15 testing sessions of the ICILS student assessment in each 
participating country and interview the school coordinator and test administrator. Their main task 
was to monitor compliance with the internationally standardized procedures and report their 
findings to IEA. The SAQ was completed by all national study centers and shed light on different 
areas of the implementation of the ICILS 2018 main survey procedures, such as sampling, study 
preparation activities, administration of |CILS 2018, or data processing. NRCs provided their 
feedback on the general approach to ICILS 2018, the procedures, and support materials provided 
by the consortium. 


Taken together, these activities form an important source of information about the implementation 
of different steps and processes of ICILS 2018, taking into account different perspectives from 
the national and international levels. 
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The data were transposed from long format to a wide format so that they could be transformed 
for processing and analysis. This process resulted in a predefined table structure, containing one 
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imputation, or cleaning. Resolution of any inconsistencies remaining after this data entry stage 

would occur during data cleaning (see below). 

The rules for data entry included the following: 

e Responses to categorical questions to be generally coded as “1” if the first option was used, “2” 
if the second option was marked, and so on. 

e Responses to “mark-all-that-apply” questions to be coded as either “1” (marked) or “9” (not 
marked/omitted). 


e Responses to numerical or scale questions (e.g., school enrolment) to be entered “as is,” that is, 
without any correction or truncation, even if the value is outside the originally expected range 
(e.g., if an ICT coordinator reports more than 1000 computers available to students in the 
school). 


e Likewise, responses to filter questions and filter-dependent questions to be entered exactly as 
filled in by the respondent, even if the information provided is logically inconsistent. 


e |f responses were not given at all, not given in the expected format, ambiguous, or in any other 
way conflicting (e.g., selection of two options in a multiple-choice question), the corresponding 
variable was to be coded as “omitted or invalid’ 


Data entered with the IEA DME were automatically validated. First, the entered respondent ID 
was validated with a five-digit code—the checksum (generated by the IEA WinW38S). A mistype in 
either the ID or the checksum resulted in an error message that prompted the data-entry person to 
check the entered values. The data-verification module of the IEA DME also enabled identification 
of a range of problems such as inconsistencies in |D codes and out-of-range or otherwise invalid 
codes. Individuals entering the data had to resolve problems or confirm potential problems before 
they could resume data entry. 


Double-data entry 


To check the reliability of the data entry within respective participating countries, national centers 
were required to have all of the paper-based principal, |CT coordinator, and teacher questionnaires 
entered by two different staff members. IEA recommended that national centers begin the double- 
data entry process as early as possible during the data capture period in order to identify possib 
systematic, incidental misunderstandings or mishandlings of data entry rules and to initiate 
appropriate remedial actions, for example, retraining staff. Those entering the data were required 
to resolve identified discrepancies between the first and second data entries by consulting the 
original questionnaire and applying the international rules in a uniform way. 


oo) 


While it was desirable that each and every discrepancy be resolved before submission of the 
complete dataset, the acceptable level of disagreement between the first and second data entry 
was established at one percent or less; any value above this level required complete re-entry of the 
data. This restriction guaranteed that the margin of error observed for processed data remained 
well below the required threshold. 


The level of disagreement between the first and second data entry was evaluated by IEA. Data for 
those countries who had administered paper-based questionnaires and submitted an IEA DME 
database showed no differences between the main files and the files created for the purpose of 
double-data entry. 


Data verification at the national centers 


Before sending the data to IEA for further processing, national centers were to carry out 
mandatory validation and verification steps on all entered data and apply corrections as necessary. 
The corresponding routines were included in the IEA DME software, which automatically and 
systematically checked data files for duplicate |D codes and data outside the defined valid ranges or 
value schemes. Data managers reviewed the corresponding reports, resolved any inconsistencies, 
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and (where possible) corrected problems by looking up the original survey questionnaires. Data 
managers also verified that all returned non-empty questionnaires had definitely been entered. 
They also checked that the availability of data corresponded to the participation indicator variables 
and entries on the tracking forms and as entered in the IEA WinW3S. 


In addition to submitting the data files described above, national centers provided IEA with detailed 
data documentation, including hard copies or electronic scans of all original student and teacher 
tracking forms and areport on data-capture activities collected as part of the online survey activities 
questionnaire. IEA already had access, as part of the layout verification process, to electronic copies 
of the national versions of all questionnaires and the final national adaptation forms. 


While the questionnaire data were being entered, each national center used the information from 
the teacher tracking forms to verify the completeness of the materials. Participation information 
(e.g., whether the concerned teacher had left the school permanently between the time of sampling 
and the time of administration) was entered via the IEA WinW3S. 


This process was also supported by the option in the IEA WinWSS to generate an inconsistency 
report. This report listed all discrepancies between variables recorded during the within-school 
sampling and test administration process and so made it possible to cross check these data against 
the actual availability of data entered in the IEA DME, the database for online respondents, and 
the uploaded student data on the central international server. Data managers were requested to 
resolve these problems before final data submission to IEA. If inconsistencies had to remain or 
the national center could not solve them, IEA asked the center to provide documentation on these 


problems. IEA used this documentation when processing the data at a later stage. 


Confirming the integrity of the national databases 


Overview 


As described earlier in this chapter, national centers in each participating country were responsible 
for entering their national |CILS 2018 data into the appropriate data files and submitting these 
files to IEA. Furthermore, the data from the online questionnaires were automatically stored ona 
central international server. IEA then subjected these data to acomprehensive process of checking 
and editing. To facilitate the data cleaning process, IEA asked the national centers to provide 
them with detailed documentation of their data together with their national data files. The data 
documentation included copies of all original survey tracking forms, the national versions of test 
booklets and questionnaires, as well as information from the survey activities questionnaire (see 
details in Chapter 6). National centers also submitted their final national adaptation forms in order 
to provide and confirm complete documentation on all national adaptations. In addition, national 
centers were asked to provide documentation on all changes or edits applied to the data prior to 
submission, as well as any verified findings that could remain. 


Ensuring the integrity of the international database required close cooperation between the 
international and national institutions involved in ICILS 2018. After each country had submitted 
its data and required documentation, IEA, in collaboration with the national research coordinators 
NRCs), conducted a four-step cleaning procedure upon the submitted data and documentation: 


( 

(1) Documentation and structure check; 

(2) \ID variable cleaning; 

(3) Linkage cleaning; and 

(4) Background cleaning (resolving inconsistencies in questionnaire data). 
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The cleaning process was an iterative process. Numerous iterations of the four-step cleaning 
procedure were completed on each national data set. This repetition ensured that all data were 
properly cleaned and that any new errors that could have been introduced during the data cleaning 
were rectified. The cleaning process was repeated as many times as necessary until all data were 
consistent and comparable. Any inconsistencies detected during the cleaning process were resolved 
in collaboration with national centers, and all corrections made during the cleaning process were 
documented in a cleaning report produced for each country. 


During the first step, IEA checked the data files provided by each country. In the following steps, 
they applied a set of over 120 cleaning rules to verify the validity and consistency of the data and 
documented any deviations from the international file structure. 


Having completed this work, IEA staff sent queries to the national centers. These required the 
centers to either confirm IEA’s proposed data-editing actions or provide additional information to 
resolve inconsistencies. After all modifications had been applied, IEA rechecked all datasets. This 
process of editing the data, checking the reports, and implementing corrections was repeated as 
many times as necessary to help ensure that data were consistent within and comparable across 
countries. 


After the national files had been checked, IEA provided national centers with univariate statistics 
at the national and international levels. This material enabled national centers to compare their 
national data with the international results as they were included in the draft international report 
and related data and documentation. 


This step was one of the most important quality measures implemented, because it helped to 
ensure the comparability of the data across countries. For example, a particular statistic that 
might have seemed plausible within a national context could have appeared as an outlier when the 
national results were compared with the international results. The outlier could hint to an error in 
translation, data capture, coding, etc. The international team reviewed all such instances and, where 
necessary, addressed them, for example, by recoding the corresponding variables in appropriate 
ways or, if errors could not be corrected, removing them from the international database. 


Once the national databases had been verified and formatted according to the international file 
format, IEA sent data to the ISC, which then produced and subsequently reviewed the basic item 
statistics. At the same time, IEA produced data files containing information on the participation 
status of schools, students, and teachers in each country’s sample. IEA then used this information, 
together with data captured by the software designed to standardize operations and tasks, to 
calculate sampling weights, population coverage, and school, teacher, and student participation 
rates. Chapter 7 of this report provides details about the weighting procedures. 


In asubsequent step, the ISC estimated CIL performance and CT scores as well as questionnaire 
indices for students, teachers, and schools (see Chapters 11 and 12 for scaling methods and 
procedures). On completing their verification of the sampling weights and scale scores, the ISC 
sent these derived variables to IEA for inclusion in the international database and for distribution 
to the national centers. 


Data cleaning quality control 


Because ICILS 2018 was a large and highly complex study with high standards for data quality, 
maintaining these standards required an extensive set of interrelated data checking and data 
cleaning procedures. To ensure all procedures were conducted in the correct sequence, that 
no special requirements were overlooked, and that the cleaning process was implemented 
independently of the persons in charge, the data quality control included the following steps: 


e Thorough testing of all data cleaning programs: Before applying the programs to real datasets, IEA 
applied them to simulation datasets containing all possible problems and inconsistencies. 
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The main objective of the data cleaning process was to ensure that the data adhered to international 
formats, that school, teacher, and student information could be linked across different survey data 
files, and that the data reflected the information collected within each country in an accurate and 
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The program based data cleaning consisted of the following activities (summarized in Figure 
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and necessary adaptations relating to culture or language-specific terms), whereas other countries 
inserted response categories within existing international variables or added national variables. 


To keep track of adaptations, IEA asked the national centers to complete national adaptation forms 
while they were adapting the international codebooks. Where necessary, IEA modified the structure 
and values of the national data files to ensure that the resulting data remained comparable across 
countries. Details about country-specific adaptations to the international instruments can be found 
in Appendix 2 of the ICILS 2018 user guide for the international database (Mikheeva and Meyer, 2020). 


At this stage, IEA discarded variables created purely for verification purposes during data entry, and 
made provision for new variables necessary for analysis and reporting. These included reporting 
variables, derived variables, sampling weights, and scale scores. 


Once IEA had ensured that each data file matched the international format, they applied a series 
of standard data cleaning rules for further processing. Processing during this step employed 
software developed by IEA that could identify and correct inconsistencies in the data. Each 
potential problem flagged at this stage was identified by a unique problem number, described 
and recorded in a database alongside the specific action taken by the cleaning program or IEA in 
relation to the problem. 


|EAreferred problems that could not be rectified automatically to the responsible NRC so that the 
national centers could check the original data collection instruments and tracking forms to trace 
the source of these errors. Wherever possible, IEA suggested a remedy and asked the national 
centers to either accept or propose an alternative. If anational center could not resolve problems 
through verification of the instruments or forms, IEA applied a general cleaning rule to the files 
to rectify this error. When all automatic updates had been applied, IEA ran individual recodings in 
the data to directly apply any remaining corrections to the data flles. 


Cleaning identification (ID) variables 
Eachrecord ina data file needs to have a unique |D number. The existence of records with duplicate 
D numbers ina file implies an error of some kind. If two records in an|ICILS 2018 database shared 
the same 1D number and contained exactly the same data, IEA deleted one of the records and kept 
the other one in the database. If both records contained different data and IEA found it impossible 
to identify which record contained the “true data,’ they removed both records from the database. 
EA tried to keep such losses to a minimum; actual deletions were very rare. 


Although the ID cleaning covered all data from all instruments, it focused mainly on the student 
file. This step in data cleaning included the preparation of the student test records provided to 
EA by RM Results. Due to the administration of the student test on USB sticks, data uploaded 
fter test sessions often contained several student records within a country with the same student 
D. In most of the cases, such records were duplicates. In extreme cases, students from an entire 
school had the same student ID. 


a 


The possible sources of multiple records were tracked back to the test administration procedures 
at schools or technical constraints of the student test delivery software. Depending on the nature 
of a multiple session, the records were used for processing, deleted, re-identified, or merged. In 
addition to checking the unique student ID number, it was crucial to check variables pertaining to 
student participation and exclusion status, as well as students’ dates of birth and dates of testing 
in order to calculate student age at the time of testing. The student tracking forms provided an 
important tool for resolving anomalies in the database. The records were cleaned by IEA with 
confirmation from national centers for individual records. Further details on cleaning of multiple 
records are reflected in Figure 10.1 below. 
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Figure 10.1: Overview of data processing at IEA 
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Resolving inconsistencies in questionnaire data 
The amount of inconsistent and implausible responses in questionnaire data files varied 
considerably among countries and none were completely free of inconsistent responses. IEA 
determined the treatment of inconsistent responses on a question-by-question basis, using all 
available documentation to make an informed decision. IEA also checked all questionnaire data 
for consistency across the responses given. 


For example, Question 3 inthe principal questionnaire asked for the total school enrolment (number 
of boys and number of girls, respectively) in all grades, while Question 4 asked for the enrolment in 
the target grade only. Clearly, the number given as a response to Question 4 could not exceed the 
number provided in Question 3. Similarly, it was not possible for the sum of all full-time teachers 
and part-time teachers, as asked in Question 6, to equal zero. In another example, Question 7 of 
the ICT coordinator questionnaire asked for the total number of ICT devices in the school, and the 
umber of ICT devices available to students. The total number of |CT devices in the school could 
ot be smaller than the number available to students. 


IEA flagged inconsistencies of this kind and then asked the national centers to review. IEA recoded 
those cases that could not be corrected or where the response provided was not usable for analysis 
as “omitted. 


+ 


Filter questions, which appeared in some questionnaires, directed respondents to a particular sub- 
question or further section of the questionnaire. IEA applied the following cleaning rule to these 
filter questions and the dependent questions that followed: If the answer to the filter question 
is “no” or “not applicable, any responses to the dependent questions were recoded as “logically 
not applicable.” 


i. 


IEA also applied what is known as a split variable check to questions where the answer was coded 
into several variables. For example, Question 26 in the student questionnaire asked students: “At 
school, have you learned about the importance of the following topics?” Student responses were 
captured in a set of four variables, each one coded as “Yes” if the corresponding “Yes” option was 
checked and “No” if the “No” option was filled in. Occasionally, students checked the “Yes” boxes 
but left the “No” boxes unchecked. Because, in these cases, it was clear that the unchecked boxes 
actually meant “No,’ these responses were recoded accordingly, provided that the students had 
given affirmative responses in the other categories. 


Resolving inconsistent tracking and questionnaire information 
Two different sets of ICILS 2018 data indicated age and gender for both teachers and students. 
The first set was tracking information provided by the school coordinator or test administrator 
throughout the within-school sampling and test/questionnaire administration process. The second 
set comprised of actual responses given by individuals in the contextual questionnaires. In some 
cases, data across these two sets did not match and resolution was needed. 


If the information on gender or birth year and month was missing in the student questionnaire 
but the student participated, then this data was copied over from the tracking information to the 
questionnaire, if available. 


The teacher questionnaire did not ask teachers to provide birth year and month but rather to choose 
between five age ranges. Year of birth, which was indicated in the tracking forms, was then recoded 
into age groups and cross checked against the range indicated by the questionnaire responses. If 
gender and/or age range information was missing from the teacher questionnaire but the teacher 
participated, this data was copied over from the tracking information to the questionnaire. 


If discrepancies were found between tracking and questionnaire gender and age data, the 
questionnaire information (for both teachers and students) replaced the tracking information. 
However, for teacher birth year, tracking information was set to missing given that only an age 
range, not a specific year, was indicated. 


130 ICILS 2018 TECHNICAL REPORT 


Handling of missing data 

Two types of entries were possible during the |CILS 2018 data capture: valid data values and missing 
data values. Missing data can be assigned a value of “omitted or invalid,’ or “not administered” 
during data capture. With the exception of the “not reached” missing codes assigned at ACER, IEA 
applied additional missing codes to the data to facilitate further analyses. This process led to four 
distinct types of missing data in the international database: 


e Omitted or invalid: The respondent had achance to answer the question but did not do so, leaving 
the corresponding item or question blank. Alternatively, the response was non-interpretable 
or out of range. 


e Notadministered: This signified that the item or question was not administered to the respondent, 
which meant that the respondent could not read and answer the question. The not administered 
missing code was used for those student test variables that were not in the sets of modules 
administered to a student either deliberately (due to the rotation of modules) or, in very few 
cases, due to technical failure or incorrect translations. The missing code was also used for those 
records that were included in the international database but did not contain a single response 
to one of the assigned questionnaires. This situation applied to students who participated in 
the student test but did not answer the student questionnaire. It also applied to schools where 
only one of the principal or ICT coordinator questionnaires was returned with responses. In 
addition, the not administered code was also used for individual questionnaire items that were 

not administered in a national context because the country removed the corresponding question 

from the questionnaire or because the translation was incorrect. 


e Logically not applicable: The respondent answered a preceding filter question in away that made 
the following dependent questions not applicable. 


e Notreached (this applied only to the individual items of the student test): This code indicated those 
items where it was believed that the students did not reach because of a lack of time. “Not 
reached” codes were derived as follows: an item received this coding if astudent did not respond 
to any of the items following it within the same booklet? (i.e., the student did not complete any 
of the remaining test questions), if he or she did not respond to the item preceding it, and if he 
or she did not have sufficient time to finish a module in the booklet. 


Checking the interim data products 


Building the international database was an iterative process. Once IEA completed each major data 
processing step, it sent a new version of the data files to the national centers so that they could 
review their data and run their own separate checks to validate the new data file versions. This 
process implied that national centers received several versions of their data, and their data only, 
before release of the draft and final versions of the international databases. All interim data were 
made available in full to the ISC, whereas each participating country received only its own data. 


EAsent the first version of data and accompanying documentation to national centers on October 
30, 2018. At this time, data for all countries who administered ICILS in the first half of 2018 were 
sent out. The data for countries who administered ICILS in the second half of 2018 received their 
data on January 16, 2019. This first version of each country dataset included the following data 
and documentation: 


e School-, student-, and teacher-level SPSS and SAS data files; 


e Univariate descriptive statistics for all variables in the data files; 


e Acleaning report that included a list of structural and case-level findings; 


e Arecoding documentation for country-specific data edits applied by IEA; and 


1. The term booklet is used here in reference to any possible combination of test modules. 
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e Cleaning documentation describing the initial cleaning procedures undertaken at IEA and 
describing the data filles and statistics provided. 


IEA provided the ISC with subsequent versions of the data and related documentation as soon 
as it had implemented feedback from national centers. These additional versions of the data files 
were accompanied with the sampling weights and international achievement scores as soon as 
these became available. During this stage of the data-processing process, IEA asked countries to 
review the documentation on adaptations to the national versions of their instruments and the 
related edits applied to the data files. 


In May 2019 all national centers received the data from all other ICILS 2018 countries. This 
data version is called the draft international database since it roughly reflects the structure of 
the released databases. Most prominently and compared to earlier data send-outs this version 
included sampling weights and scales. All persons within the national centers needed to sign a 
confidentiality agreement assuring no sharing of any ICILS 2018 data products or information 
with respect to the results. 


During the fifth NRC meeting in Jyvaskyla, Finland in June 2019, NRCs had the opportunity to 
raise any further issues concerning their data that had not yet been raised. 


In August 2019, IEA provided NRCs with an updated version of the draft international database. 
This version was necessary due to minor edits IEA and ACER were made aware of and which 
resulted from internal quality control procedures. The ISC used this version of the data to produce 
the updated, final tables for the international report. 


The ICILS 2018 international database 


The ICILS 2018 international database incorporated all national data files from participating 
countries. The data processing and validation at the international level helped to ensure that: 


e Information coded in each variable was internationally comparable; 
e National adaptations were reflected appropriately in all variables; 


¢ Questions that were not internationally comparable were removed from the database; 


e All entries in the database could be linked to the appropriate respondent—student, teacher, 
principal, or |CT coordinator; 


e Only those records considered as participating (following adjudication) remained in the 
international database files; 


e Sampling weights and student achievement scores were available for international comparisons; 
and 


e Indirect identification of individuals was prevented by applying confidentiality measures, such 
as scrambling ID variables or removing some of the personal data variables that were needed 
only during field operations and data processing. 


More information about the ICILS 2018 international database is provided in the ICILS 2018 user 
guide for the international database (Mikheeva and Meyer, 2020). 
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Summary 


To achieve a high-quality database, ICILS 2018 implemented a series of data management 
procedures that included checks to ensure the consistency of national database structures, proper 
documentation of all national adaptations, and ensure the comparability of international variables 
across national datasets. IEA reviewed all national databases in cooperation with national centers 
and the larger international team. The review process followed a series of thorough checking 
procedures, which led to the creation of the final ICILS 2018 database. The final data products 
included item statistics, national data files, and the international database accompanied by a user 
guide and supplementary information. 
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CHAPTER 11: 


Scaling procedures for ICILS 2018 test 
items 


Louise Ockwell, Alex Daraganov, and Wolfram Schulz 


Introduction 


This chapter describes the procedures used to analyze and scale the ICILS 2018 test items that 
were administered to measure students’ computer and information literacy (CIL) and computational 
thinking (CT). It covers the following topics: 


e The scaling model used to analyze and scale the test items; 


e Test coverage; 
e |tem dimensionality and local dependence; 
e Assessment of item fit; 


e Assessment of scorer reliabilities for open-ended items; 


e Differential item functioning by gender; 


e Review of cross-national measurement equivalence; 


e International item adjudication; 
e International item calibration and test reliability; 

e International ability estimates (plausible values and weighted likelihood estimates); and 
e Estimation of changes in students’ CIL between 2013 and 2018. 


The development of the CIL and CT test items is described in Chapter 2 and was guided by the 
CILS 2018 assessment framework (see Fraillon et al. 2019). 


The scaling model 
tem response theory (IRT) scaling methodology was used to scale the test items. 
For dichotomous items, we used the one-parameter (Rasch) model (Rasch 1960), which models 
the probability of selecting category 1 instead of O as: 
exp(6,- 6;) 
1+exp(6,- 6) 


where P. (6,) is the probability for person n to score 1 on item i, 6, denotes the estimate of person 
n's location on the latent continuum (which in proficiency tests is commonly referred to as person 
ability) and 6,is the estimated location of item ion the same latent continuum (which in proficiency 
tests is commonly referred to as item difficulty). For each item, item responses are modeled as a 
function of the latent trait 6,. 


Inthe case of items with more than two categories (inthe ClLand CT assessments, items with more 
than one score point), this model can be generalized to the (Rasch) partial credit model (Masters 
and Wright 1997), which takes the form of: 

EXP 2 6 (0,- 8;+7;) 


YP, EXP D319 (8,- 8,447) 


h=0 


Py, (0,,) = x, = 0,1,....; 

where Px; (6,,) denotes the probability of personn scoring x on item i, 8, denotes the estimate of the 
person n’s location on the latent continuum, the item parameter 6; denotes the estimated average 
location (across the categories) of the item on the latent continuum, and 1 provides an additional 
step parameter that denotes the distance between estimates of each category boundary and the 
estimated location of the item on the latent continuum. ACER ConQuest, Version 4.0 software 
(Adam et al. 2015) was used to scale the CIL and CT test items for ICILS 2018. 
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Test coverage and item dimensionality 


When measuring cognitive abilities, it is important to use test items that cover the range of 
achievement found in the target population. First, we estimated the distribution of CIL and CT 
among ICILS 2018 students and the location of the corresponding item thresholds (with aresponse 
probability, rp = 0.51; see Figures 11.1 and 11.2). Item thresholds were equal to item difficulties of 
dichotomous items. For partial credit items, a difficulty threshold was estimated for each score.” 


Figure 11.1: Mapping of CIL student abilities and item difficulties 
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XXHKKXOOOOOOOOOKXXXKKHKXXKMKKMKXXXKKKNIOOOONK S42 -42.1-«59-74— 80.1 


XXXKXXOOOOOOOOKXKXXKXKXXXXKAXXKAXXK 13.2 24.1 43.1 54 
ie} XXXKXXOOWMOKMKXMKXAXKXXXXXXXAAXXK 26.2 48 55.1 58 60 64.2 65 
XXXXXXOOWMOOOOOAKKKKKAXXXAKKKMKK | 12.2 14 27.1 30.1 40 51 
XXXXXXOOOOOKKKXXXXXXAXXXAK! 13.1 18 64.1 77 
XXXKKKOOOOOOOOOOKXKXXK | 10.1 46.1 69 
XXXKXOOOOOOOHHKXXXAXKKKX 1 2-9 12.4 25.1 
XXXXXXOOOOOOOXXXXXXKKXK | 37-50 
XOOOKKXXXAXXAAXX | 26.1 73 
OOOO XXXXXXKXK | 71.1 71.2 72 


1 This means that a respondent with the same score on the latent continuum as the item location parameter has a 

probability of 50 percent to give a correct response. 

2 This “Thurstonian’ threshold indicates for each item score the point on the location, where a respondent with the same 
latent trait score has a probability of 50 percent to obtain this item score or higher 
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Figure 11.2: Mapping of CT student abilities and item difficulties 


XXXXXXXXXXXXXKXXXXAXXX 7.1 13.3 16.2 
XXXXXXXXXXXXXXKAXXXXXXXXXK, 4 5.2 15.2 16.1 
KXKKXX XK KKKKKKKX KKK KKK 
XXXXXXXXXXXXXXXXXAXXAXXAXXAXX1 6.2 9.3 14.2 
XXXKXXOWOOOOOOOH KX XXKXKKXXKXXKKKAKKXXXXKKX | 
XXXXXK KKK K KKK KKK KK KKK KK KKK MKKKAN | 
XXOOOK OOOO AKA KKK | 12 81 13.2 


XXXKXXXOOOOOOOK KK XXXKXKXAKKXAXXXAXK | 5.2 15.1 


KXXKXXOOOOOOHOOKXKXKXXXKXKXAXKAKXAXKA 1.1 3.2 14.1 
XXXKXKKKOOMHKKKXKXKXKXKXKXKXKKX | 
XXXKKXOOOOOOOOAXXXXXXXKAXK ! 2.2. 3.1 
XXXKXXXOOHMKKKXXKXKXKXHKN | 
1 YOOOOKXXXXXXXXXXXKX | 11.2 
XOOOOOOOOKXXXXXKKKX! 9.2. 12.2 13.1 
XXXKXXKXXXXXXKXK 1 10.2 
YHOOOOKK KXXHKXOK | 2.1 
YXXXXXMXAXKK! 11.1 12.1 


For both ClLand CT assessments, the range of item difficulties broadly matched students’ abilities 
found inthe student population. However, it is important to acknowledge that the match between 
test item difficulty and student ability varied considerably across countries depending on the 
distribution of student achievement within each ICILS 2018 country (see Fraillon et al. 2020, p. 
75 & 103). 
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Assessment of item fit 


Before reviewing the international scales in detail and more specific item statistics, we evaluated 
the model fit for individual items. One way to determine goodness of fit is by calculating a mean 
square statistic (Wright and Masters 1982). Reviewing this residual-based item fit provides an 
indication of the extent to which each item fits the item response model. However, there are no 
clear rules for acceptable item fit, and some statisticians recommend that analysts and researchers 
interpret residual-based statistics with caution (see, for example, Rost and von Davier 1994). 


To assess the assumptions of the IRT model, our review of item fit was based on a combination 
of assessments, such as: mean square statistics, the item-rest correlations, item characteristic 
curves, percentages of students in each response category, and the average ability of students 
in each response category. We also reviewed item characteristic curves (ICCs), which provide 
a graphical representation of the observed success of students on an item in comparison to the 
probability of success predicted by the model across the range of student abilities for each item, 
including dichotomous and partial credit items. 


While the theory and principles underpinning the evaluation of model fit for items relates to all 
items, there are slight differences in terms of assessing the psychometric characteristics of items 
depending on whether they are dichotomous or have more than two categories. In the following 
section we present examples of reviews of psychometric characteristics for each of these two 
item types. 


Example dichotomous item 

Figure 11.3 shows the ICC for item RO2Z, which is a dichotomously scored constructed response 
item. It can be observed that the curve fits the expected model very closely, which is also suggested 
by the weighted MNSQ of 0.97. The item (location) parameter of 1.40 indicates that this item is 
moderately difficult, and it was retained for the scaling of CIL items. 


Figure 11.3: Item characteristic curve by score for dichotomous item RO2Z 


Characteristic curve(s) by score 
Item 6.1 (RO2Z) 


Weighted MNSQ 0.97 


Probability 


Latent trait (logits) 


Delta(s): 1.40 


-@- Item6t:1 — Item 61 model probability category 2 
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Example partial credit item 

The ICC for item BO8Z, an item with O, 1, or 2 score points, showed that, for scores of O and 2 
score points the curves depicting the observed item responses were not entirely consistent with 
those predicted by the Rasch partial credit model. Fewer of the students with lower levels of CIL 
received a score of O, and more received a score of 2 than predicted by the model. In addition, 
fewer of the students of higher ability had a score of 2, and more had a score of O than predicted. 
This indicates that the item discriminated not as well as expected. However, the ICC still indicated 
that students with higher levels of knowledge received higher scores for this item (Figure 11.4). 
The item was retained for the scaling of ClL items given that its discrimination was judged as still 
appropriate for the measurement of the latent trait. 


Figure 11.4: Item characteristic curve by category for item BO8Z 


Characteristic curve(s) by score 
Item 10 (BO8Z) 


Weighted MNSQ 1:08 
1 = 


0.9 5 
08 + 
0.7 
0.6 4 
0.5 5 


04-4 


Probability 


03 


Delta(s): -0.26 0.73 Latent trait (logits) 


--e- Item 10:0 — lItem 10 model probability category 1 
--@- Item 10:1 — lItem 10 model probability category 2 
--@- Item 10:2 — Item 10 model probability category 3 


We further analyzed the functioning of the constructed response scoring guides by reviewing the 
proportion of responses in each score (to confirm that each score was represented in the scoring) 
and confirming that the mean abilities of students achieving each score on an item (e.g., 0, 1, 2) 
discernably increased with the increase in scores (i.e., that the mean ability of students achieving 
ascore of 2 ona given item was higher than that of students achieving a score of 1 and that this 
was in turn higher than that of students receiving ascore of 0). This analysis confirmed that all the 
constructed-response items included in the final set of scaled scored items were of satisfactory 
psychometric quality. 
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Item adjudication outcomes 
n total, three ClL items (HO4A, SO5Z, and GO5A) were not included in the calibration of items or 
scaling of students’ CIL score. HO4A and SO5Z had already been excluded from analysis in 2013, 
but were re-administered in 2018, while GO5A was newly developed for ICILS 2018. For these 
three items, preliminary analysis using 2018 data showed unsatisfactory item statistics and it was 
decided to exclude these items from further analysis. One further CIL item (SO7Z) was excluded 
from the scaling of ClL items at a later analysis stage due to unsatisfactory scaling properties. 


We determined the item-rest correlations, the correlation between the score of one item and the 
total rawscore derived from all other items assigned to each student, of correct responses (or partial 
credit responses) and the weighted item fit statistics (Table 11.1). Only five CIL items and none of 
the CT items had an item-rest correlation below 0.2 (which indicates a rather low discrimination). 
We found unsatisfactory residual-based item fit statistics for only one ClLitem and three CT items. 


Table 11.1 and 11.2 show the item-rest correlations of correct responses to multiple-choice items 
or scored partial credit items, and the weighted item fit statistics for ClLand CT items, respectively. 
Information about item SO7Z is still included in this table even though the item was later removed 
from scaling. 


For CIL items, the item-rest correlations ranged from 0.11 to 0.63, while CT items had values 
ranging from 0.24 to 0.70. Item-rest correlations of 0.20 or lower were usually flagged for further 
review (six items). Two of the ClL items (GO4Z and SO6Z) were included in the scaling of ClL items 
even though we found weighted MNSQ statistics of around 1.20 or higher suggesting somewhat 
less satisfactory item fit to the model, denoted by less discrimination between high and low 
performing students than predicted by the model. Similar observations were made regarding the 
CT items TAO5Z and TAQ7Z that also showed relatively poor fit with weighted MNSQ above 1.20. 
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Table 11.1: International CIL item-rest correlations and weighted item fit 
Item no. Item name Item-rest Weighted fit Item no. Item name Item-rest Weighted fit 
correlation correlation 
1 BO1Z 0.31 1.11 42 SO8C 0.56 1.00 
2 BO2Z 0.38 0.99 43 SOsD 0.53 0.99 
3 BO3Z 0.35 0.98 44 SO8E 0.54 0.88 
4 BO4Z 0.25 1.08 45 SO8F 0.55 0.83 
5 BO5Z 0.36 1.02 46 SO8G 0.50 1.02 
6 BO6Z 0.28 1.03 47 G01Z 0.21 .10 
7 BO7A 0.23 1.14 48 G02Z 0.27 .10 
8 BO7B 0.30 0.94 49 GO3Z 0.11 0.97 
9 BO7C 0.36 1.06 50 G04Z 0.20 Ay 
10 BO8Z 0.44 1.12 Bal GO5B 0.42 0.97 
11 BO9A 0.58 0.82 52 GO06Z 0.25 1.08 
12 BO9B 0.63 0.89 53 GO7Z 0.31 1.13 
13 BO9C 0.61 0.93 54 GO8Z 0.25 1.11 
14 BO9D 0.49 0.90 55 GO9C 0.43 1.02 
15 BO9E 0.43 1.00 56 GO9D 0.44 0.95 
16 BO9F 0.60 0.83 57 GO9E 0.39 0.92 
17 BO9G 0.55 0.86 58 GO9F 0.36 0.87 
8 01Z 0.35 1.04 59 GO9G 0.44 0.96 
19 02Z 0.38 1.03 60 RO1Z 0.48 0.93 
20 03Z 0.42 0.91 61 RO2Z 0.35 1.00 
21 05Z 0.31 1.08 62 RO3Z 0.34 1.06 
22 06Z 0.39 0.99 63 RO4Z 0.38 1.01 
23 O7A 0.55 0.93 64 ROSA 0.58 1.17 
24 07B 0.58 0.89 65 RO6A 0.34 1.06 
25 07C 0.60 0.94 66 RO6B 0.34 1.01 
26 07D 0.49 1417 67 RO7Z 0.41 1.03 
27 O7E 0.43 1:15 68 RO8Z 0.37 1.02 
28 O7F 0.50 0.92 69 RO9Z 0.36 1.04 
29 07G 0.40 0.99 70 R10Z 0.25 1.04 
30 07H 0.57 0.97 72 R11A 0.53 1.04 
31 O7| 0.48 0.88 72 R11B 0.51 0.86 
32 O7J 0.29 0.90 73 R11C 0.53 0.84 
33 SO1Z 0.23 07 74 R11D 0.46 0.93 
34 SO2Z 0.23 12 Le R12A 0.19 1.09 
35 SO3Z 0.21 16 76 R12B 0.37 1.00 
36 SO4A 0.29 04 77 R12C 0.44 0.94 
37 S04B 0.34 1.02 78 R12D 0.37 1.01 
38 SO6Z 0.15 1.21 79 R12G 0.26 1.03 
39 SO7Z 0.15 1.16 80 R12H 0.58 0.98 
40 SO8A 0.52 0.87 81 R12! 0.48 0.94 
4 SO8B 0.59 0.82 82 R125 0.30 0.87 
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Table 11.2: International CT item-rest correlations and weighted item fit 


Item no. Item name Item-rest correlation Weighted fit 

TAO1Z 0.46 1,12 
2 TAO2Z 0.4 13 
3 TAO3Z 0. 18 
4 TAO4Z 0.24 09 
5 TAO5Z 0.37 21 
6 TAO6Z 0.54 0.98 
7 TAO7Z 0.42 27 
8 TAO8Z 0.34 12 
9 TFO1E 0.54 0.96 
0 TFO2L 0.48 0.86 
11 TFO3TEC 0.65 0.84 
12 TFOATEC 0.68 0.80 
13 TFOSTE 0.70 0.84 
14 FO6TE 0.69 0.80 
15 FO7TE 0.67 0.83 
16 FO8TEC 0.54 0.91 
17 FO9 0.44 0.88 
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Figure 11.5: Local dependence within CIL modules 


2.27 
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Band competition 


ie Complete modules (original) 
[| Large tasks (original) 
ee Single tasks (original) 


Breathing School trip 


Board games Recycling 


Complete modules (cleaned) 
Large tasks (cleaned) 


Single tasks (cleaned) 


removing items with unsatisfactory psychometric characteristics showed improvement but local 
dependence was still evident within four out of five complete modules. When considering only 
large task items, local dependence was still evident within all five modules, but only within two of 
five modules when considering only single task items. 


Table 11.3: CIL item pairs showing local dependence 


Module Task type Item 1 Item 2 MNSQ T value Action taken 
G arge GO9IC GO9ID 1.36 29 GO9YC and GOIG combined 
G arge GO9IC GO9IG 1.56 42 GO9YC and GOIG combined 
arge O7A 07D 1.42 33 07D removed 
arge O7C 07D 1.40 31 07D removed, HO7B, and HO7C combined 
arge 07D O7F 1.33 25 07D removed 
arge 07D O7G 1.33 25 07D removed 
arge O7H HO7| 1.42 eee 07H and HO7I combined 
R single R11A R11B 1.36 24 R11A,R11B, R11C, and R11D combined 
R single R11A R11C 1.43 28 R11A,R11B, R11C, and R11D combined 
R large R12B R12C 1.32 28 R12B and R12C combined 
S single SO4A SO4B 1.37 31 SO4A and SO4B combined 


Assessment of scorer reliabilities 
The scoring of constructed-response items in the ICILS cognitive test was guided by the scoring 
guides that were developed for new ICILS 2018 items, then refined following the experiences 
in the international field trial, or adapted from ICILS 2013 for those items that were included to 


measure changes over ti 


me. Within countries, subsamples of about 20 percent of student responses 


to each task were scored twice following a controlled, random allocation to different scorers. The 


assignment of item responses to scorers was implemented an 


scoring systems (see Chapter 3). This double-scoring procedure allowed for the assessment of 


scorer reliabilities. 


d controlled as part of the online 


Table 11.4 shows the percentages of scorer agreement for ClL and CT items, based on 13 countries 
and benchmarking entities for CIL and eight countries and benchmarking entities for CT items. 
Percentage of agreement between scores on double-scored items ranged from 40 to 100 percent 
across countries. 
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As has been the practice in other IEA studies, only items scored with a minimum of 70 percent 
scorer agreement were included in the international database. While scorer agreement was 
above 70 percent for all constructed-response items for the pooled ICILS 2018 dataset, we also 
reviewed and adjudicated scorer agreement at the national level. There were 37 cases where 
the scoring of an open-ended response item had an agreement below 70 percent. While in eight 
out of 13 countries scorer agreement was above 70 percent for all constructed-response items, 
in some countries this criterion had not been met for several of these items. The scores for the 
corresponding items were excluded from the scaling of the corresponding national data when 
drawing plausible values but are included in the public database (see Appendix C). 


Differential item functioning by gender 


The analysis included an exploration of the quality of the items by assessing differential item 
functioning (DIF) by gender. DIF occurs when groups of students with the same degree of ability 
differ in their probabilities of responding correctly to an item. For example, if boys with the same 
degree of ability as girls have a higher probability of correctly answering an item than girls, the 
item shows DIF with regard to gender. This suggests a violation of the model’s assumptions, which 
assumes that the probability is exclusively a function of ability and not of any other characteristics 
of the respondents. 


Itis possible to derive estimates of gender DIF by including interaction terms in the item response 
model. To achieve this, gender DIF was modeled for dichotomous items as: 

exp(6,~ (8j - 1, trig) 
1+exp(6,- (8;- nz +Aj,)) 


P; (6p) 


Here, 6, is the estimated ability of person n and § is the estimated location of item i, an additional 
parameter for gender effects A. However, to obtain proper estimates, we also needed to include 
the overall gender effect (n,) in the model.? Both item-by-gender interaction estimates (Aig) and 
overall gender effects (n.) were constrained to have a sum of O. 


Gender DIF estimates for a partial credit model for items with more than two categories (here, 
constructed items) could similarly be modeled as: 
exp? (0,- (dj- Not ig + T;)) 


- x, = 0,1,...,.; 
nae EXD oe (6,- (3) ~ gt Vig + Ty) 


Py; (6, 


Here, 6, denotes the person's ability, 6; gives the item location parameter on the latent continuum, t; 
isthe step parameter, ,, is the item-by-gender interaction effect, and n, is the overall gender effect. 


Table 11.5 and 11.6 show the gender DIF estimates for CIL and CT items. Estimates above an 
absolute value of 0.3 logits were flagged as indicating substantial amount of DIF. There were no 
items that showed indications of DIF that would have suggested a removal from scaling for either 
scale. 


3 The minus sign ensures that higher values of the gender effect parameters indicate higher levels of item endorsement 
in the gender group with higher value (here, females). 
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Table 11.5: Gender DIF estimates for CIL test items 


Item no. Item name Gender DIF Item no. Item name Gender DIF 
estimate estimate 
1 BO1Z -0.04 42 SO8C 0.02 
2 BO2Z -0.15 43 SO8D -0.10 
3 BO3Z -0.11 44 SO8E 0.08 
4 BO4Z -0.03 45 SO8F 0.15 
5 BO5Z 0.04 46 SO8G 0.10 
6 BO6éZ -0.05 47 GO1Z -0.12 
7 BO7A -0.08 48 G02Z -0.19 
8 BO7B 0.04 49 GO3Z 0.01 
9 BO7C -0.04 50 G04Z -0.16 
10 BO8Z -0.10 51 GO5B -0.06 
11 BO9A 0.15 52 G06Z -0.02 
12 BO9B 0.08 53 GO7Z 0.09 
13 BO9C 0.01 54 GO8Z 0.09 
14 BO9D 0.20 55 GO9C -0.04 
15 BO9E 0.13 56 GO9D -0.0 
16 BO9F 0.07 57 GO9E 0.04 
17 BO9G 0.13 58 GO9F -0.09 
18 012 -0.14 59 GO9G -0.12 
19 O02Z -0.03 60 RO1Z -0.20 
20 03Z -0.08 61 RO2Z -0.05 
21 05Z -0.07 62 RO3Z 0.01 
22 06Z 0.02 63 RO4Z -0.12 
23 O7A 0.09 64 ROSA -0.11 
24 07B 0.06 65 RO6A -0.03 
25 07C 0.04 66 RO6éB 0.00 
26 07D -0.03 67 RO7Z 0.14 
27 O7E 0.02 68 RO8Z 0.06 
28 O7F 0.00 69 RO9Z 0.00 
29 O07G -0.03 70 R10Z 0.13 
30 O7H 0.04 71 R11A 0.06 
31 O7| -0.01 72 R11B 0.04 
32 O7J 0.03 73 R11C 0.08 
33 S01Z 0.00 74 R11D 0.03 
34 S02Z 0.00 75 R12A -0.05 
35 S03Z -0.01 76 R12B -0.04 
36 SO4A -0.04 77 R12C -0.05 
37 SO04B -0.05 78 R12D 0.02 
38 S06Z -0.10 79 R12G 0.00 
39 S07Z -0.17 80 R12H 0.01 
40 SO8A 0.12 81 R121 -0.04 
41 SO8B 0.14 82 R12) -0.08 
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Table 11.6: Gender DIF estimates for CT test items 


Item no. Item name Gender DIF estimate 

TAO1Z -0.02 
2 TAO2Z 0.09 
3 TAO3Z 0.12 
4 TAO4Z -0.04 
5 TAO5Z 0.00 
6 TAO6Z -0.03 
7 TAO7Z 0.07 
8 TAO8Z -0.03 
9 TFO1E 0.07 
0 TFO2L 0.00 
1 FO3TE 0.06 
2 FOATEC 0.00 
3 FOSTEC 0.01 
4 FO6TEC 0.03 
5 FO7TEC -0.09 
6 FO8TE -0.14 
p FO9 -0.14 


National reports with item statistics 


ational centers were provided with item statistics (see example for BO1Z in Figure 11.6) and 
equested to review any flags for the respective test items. Flags included cases of unusual 
orrelation (e.g., negative correlations between correct response and overall score) and those 
howing large differences between national and international item difficulties. They also included 
pen-ended items where the category-total correlations were disordered. In some cases, national 
enters informed the international study center of translation, scoring, or technical problems 
hat had not been detected during verification. In these cases, we categorized the items as “not 
dministered’” in the international database and excluded them from scaling of the corresponding 
ational data. 


Qa na 


own 


> Oo ee © 


Working independently from the item reviews by national centers, international study center staff 
flagged national items that showed poor scaling properties (such as item misfit or large item-by- 
country interactions) and conducted post-verifications of item translation. In one instance, we 
identified a national item that needed to be set to “not administered” in the international database 
and was consequently also excluded from the scaling of the corresponding national data. 


Appendix C provides details about items that were excluded from scaling or deleted from the 
database. 


Cross-national measurement equivalence 


With any test used to assess student achievement cross-nationally, it is important that the test 
items function similarly across those countries. Similar to the case of DIF by gender (see above), 
items show item-by-country interaction when students from different countries but with the 
same ability vary in their probability of answering these questions correctly. Test items with 
considerable item-by-country interaction are not suitable for the scaling of cognitive test items 
in international surveys. 


For the main survey analyses of test items, national item parameter calibrations were compared 
with international item parameters in order to assess the occurrence of item-by-country interaction. 
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Figure 11.6: Example of item statistics provided to national centres 


BO1Z: Band competition - 01 (Item Format: constructed response auto-coded) 


Number of cases: 1153 


Adjusted correlation: 0.18 


Item threshold(s); -1.257 


Fit (weighted MNSQ): 1.13 Item delta(s) -1.257 
Response 0 1 9 
Score O 1 O 
Students 223 896 34 
% NAT 19.34 7771 2.95 
% INT 33.9 60.35 5.75 
Ability average -0.37 0.061 -1.48 
Ability SC 1.084 0.968 0.975 
Pt Bis -0.1 0.19 -0.23 
27 Average ability by category 
1- 
0 PES | —_$$__es 
ic. | Pos 
-2 I 
27 Point biserial by category 
{I | 
7 = | ee 
=] 
-2 an | 


Fit Adjusted correlation 


0.70 1.00 1.30 (value) 0.00 0.40 0.80 (value) 


International value x 1.09 x 0.323 
Aggregated statistics LH | | 
National value xX ide xX | | 0.184 
Delta (item difficulty) Item-category threshold 
-2.0 0.0 2.0 (value)  -2.0 0.0 2.0 (value) 

International value | x -0.969 x -0.969 
Aggregated statistics | | 1] | | | na 
National value oe 12570 EK | -1.257 

Item by country interaction Adjusted correlation Fit 

No of Easier than | Harder than | Non-key PB Key PB Low adjusted | Ability not Small Large 

countries expected expected is positive is negative | correpation ordered (high discr.) (low discr.) 

BO1Z [ v| 
Countries 13. 4 4 0 0 2 0 0 O 
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Confidence intervals were computed for each national item parameter, basing the computation 
on the respective standard errors and adjusting them for possible design effects and for multiple 


comparisons. 


As an example, Figure 11.6 shows the item-by-country interaction graph for CIL item HO7E. The 
figure shows clear and considerable variation in the relative difficulty of the item across countries. 


Similar graphs produced for each test item were used 
international and national levels, while information a 
used to identify items for post-verification checks aft 


inthe test-item adjudication process at the 
bout occurrence of cross-national DIF was 
er completion of the main data collection. 


Figure 11.7: Example of item-by-country interaction graph for item HO7E 
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Although the ICILS test items showed generally only 
were some national item difficulties that deviated qui 
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limited item-by-country interactions, there 
te considerably (more than 1.3 logits) from 


the international item difficulty. In these cases (see, for example, Italy in Figure 11.7), we omitted 
the items from scaling. Appendix C includes the complete list of items that were omitted nationally 


from scaling because of substantial item-by-country i 


nteraction. 


Evaluating the impact of missing responses 


There were two possible types of missing responses in the ICILS test. These were “omitted” items 


(coded in the database as 9) and “not-administered” 
category was used when a student provided no respo 


items (coded as 8). The omitted-response 
nse at all to an item administered to him or 


her. Not-administered items were those that, although in the whole item pool, were not in the sets 
of modules (two out of four) administered to a student. Not-administered items occurred either 
by design (due to the assignment and rotation of modules) or, in the few cases described earlier in 


this chapter, due to technical failure, incorrect transla 


tions, or scaling properties. 


A separate missing category called “not reached” (coded as 7) was created for analysis and 
subsequent scaling purposes at the post-processing stage. An item was assigned this code if the 


student concerned did not respond to the item immed 
to any of the items following this item within the same 


iately preceding it, and also did not respond 
module (i.e., did not continue on to the end 


of the test). The extent of occurrence of Code 7 items provided us with information about the 
eventual appropriateness of the test length as well as the appropriateness of its difficulty, following 


similar analysis at the field trial stage. 
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Table 11.7 and 11.8 show the international percentages of 


Table 11.9 shows the average percentages of missing va 
average, each item was omitted by nearly eight percent of 
students that did not reach an item was about one percent. 


lues overall and for each module. On 


omitted and not reached responses. 


the students. The average number of 


Table 11.7: Percentages of omitted responses and items not reached due to lack of time for CIL items 


Item no. Itemname | Omitted Not Item no. Itemname | Omitted Not 
reached reached 

1 BO1Z 5.73 0.00 42 SO8C 3.33 1.94 
2 BO2Z 1.32 0.00 43 SO8D 3.33 1.94 
) BO3Z 25.45 0.02 44 SO8E 3.33 1.94 
4 BO4Z 7.05 0.04 45 SO8F 3.33 1.94 
5 BO5Z 135 0.06 46 SO8G 3.33 1.94 
6 BO6Z 7.92 0.10 47 GO1Z 241 0.00 
7 BO7A 1:35 0.25 48 GO2Z 3.46 0.00 
8 BO7B 1.36 0.25 49 GO03Z 5.64 0.04 
9 BO7C 1.35 0.25 50 GO4Z 1.14 0.06 
10 BO8Z 193 0.33 51. GO5B 3.36 0.10 
11 BO9A 4 0.95 52 GO6Z 9A3 0.15 
he BO9B 4 0.95 53 GO7Z 0.64 0.30 
13 BO9C 4.1 0.95 54 GO8Z 0.46 0.37 
14 BO9D 4.1 0.95 55 GO9C 25.94 0.64 
15 BO9E 4.1 0.95 56 GO9ID 25.94 0.64 
16 BO9F 4.1 0.95 57 GO9E 25.94 0.64 
17 BOIG 411 0.95 58 GO9F 25.94 0.64 
18 01Z 2.97 0.00 59 GOIG 25.94 0.64 
9 O02Z pills) 0.00 60 RO1Z 1.27 0.00 
20 O03Z 3.75 0.05 61 RO2Z 10.52 0.00 
21 O5Z 473 0.89 62 RO3Z 6.36 0.02 
22 06Z 2.78 1.40 63 RO4Z 9.04 0.02 
23 O7A 1.82 1.88 64 ROSA 3,35 0.04 
24 07B 1.82 1.88 65 RO6A 7.02 0.08 
25 O7C 1.82 1.88 66 RO6B 7.02 0.08 
26 07D 1.82 1.88 67 RO7Z 4.82 0.23 
27 O7E 1.82 1.88 68 RO8Z 11.68 0.30 
28 O7F 1.82 1.88 69 ROIZ 155 0.61 
29 O7G 1.82 1.88 70 R10Z 1.19 0.70 
30 O7H 1.82 1.88 71 R11A 13.24 0.92 
all O7| 1.82 1.88 72 R11B 13.24 0.92 
32 O75 11.82 1.88 73 R11C 13.88 1.01 
33 S$01Z 6.68 0.00 74 R11D 13.24 0.92 
34 S$02Z 1.99 0.00 TS R12A 7.38 5.41 
35 S03Z 34.10 0.02 76 R12B 7.38 5.41 
36 SO4A 1.17 0.05 77 R12C 7.38 5.41 
37 S04B 1.17 0.05 78 R12D 7.38 541 
38 S06Z 1.82 0.40 79 R12G 7.38 5.41 
39 SO7Z 7.02 0.52 80 R12H 7.38 5.41 
40 SO8A 3:30 1.94 81 R121 7.38 5.41 
41 S08B 3.33 1.94 82 R12) 7.38 5.41 


149 


150 


ICILS 2018 TECHNICAL REPORT 


Table 11.8: Percentages of omitted responses and items not reached due to lack of time for CT items 


Item no. Itemname | Omitted Not 
reached 

1 TAO1Z 12.94 0.00 
2 TAO2Z 10.24 0.00 
3 TAO3Z 2A AS 0.17 
4 TAO4Z 2.94 0.32 
5 TAO5Z 3.69 0.36 
6 TAO6Z 2.81 0.75 
7 TAO7Z 11.04 2.09 
8 TAO8Z 539 8.19 
9 TFO1E 1.84 0.00 
10 TFO2L 3.33 0.00 
11 FO3TEC 401 0.40 
12 FO4TEC 2.96 0.67 
13 FOSTEC 3.82 12d 
14 FO6TEC 17.40 2.21 
15 FO7TEC 8.33 6.07 
16 FO8TEC 30.69 10.14 
AF TFO9T 18.63 30.43 


Table 11.9: Percentages of omitted responses and items not reached due to lack of time overall, by module 


Average percentage 


Omitted Not reached 

Module 

Band competition 49 0.5 
Breathing 8.9 1.4 
School trip 5.5 1.0 
Board games 12.0 0.3 
Recycling 77 2A 
Grand average 7.6 12 


When comparing the proportion of omitted and not reached responses across CIL modules, we 
observed that the Board games module had the highest percentage of omitted responses (12%), 
while Band competition module had the lowest proportion (5%). Average percentages of not- 
reached responses were close to zero for most modules. Recycling module showed somewhat 
higher percentages of not-reached responses than the other modules. 


International item calibration and test reliability 


Item parameters for ClL were obtained from a joint d 


ata file 


that included response data from both 


ICILS 2013 and ICILS 2018. We included the ICILS 2013 data to improve the estimation of link 


items and for the purpose o 
applied for the equating of 1 


requirements and ICILS 20 
and had met sample partici 


but did not meet sample partici 


data file. Countries were eq 


the adjudication process). 


ua 


f equating, using the pre 
[IMSS, PIRLS, and ICCS 
and Schulz 2018). We included ICILS 2018 data from a 
13 data from three cou 
pation requirements. 
pation requirements in 201 
y weighted within each ICI 
items were included (except for items that were deleted 


ferred 


tries th 
Denmark participated in both 2013 and 2018 


n 


IEA joint calibration methodology also 


data (see Foy and Yin 2016, 2017; Gebhardt 


| 11 countries that met the sampling 
at participated in both 2013 and 2018 


18, and so was not included in the joint 
LS cycle for the CIL calibration and all 


nationally or internationally following 
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Table 11.10: Final parameter estimates of the CIL items 


Item no. Item name Item Step 1 Item no. Item name Item Step 1 Step 2 
parameter parameter 
1 BO1Z -0.939 42 SO8C 0.017 3.145 
2 BO2Z -0.895 43 S08D 0.231 0.025 
3 BO3Z 0.168 44 SO8E 0.632 0.911 
4 BO4Z 0.701 45 SO8F 0.931 -0.653 
5 BO5Z -0.016 46 SO8G -0.285 -0.085 
6 BO6Z -2.678 47 GO1Z 1.424 
7 BO7A -3.766 48 GO2Z -0.372 
8 BO7B -3.326 49 GO3Z 1.742 
9 BO7C -0.905 50 GO04Z -1.045 
O BO8Z 0.130 -0.746 51 GO5B -0.487 
1 BO9A 0.197 -1.683 52 GO6Z 1.046 
2 BO9B -0.712 0.915 53 GO7Z -1.543 
3 BO9C -0.487 0.859 54 GO8Z -0.270 
4 BO9D -0.448 55 GO9C 0.255 -0.256 
5 BO9E 0.852 56 GO9D 0.180 
6 BO9F 0.179 57 GO9E 1.490 
VA BO9IG 0.321 58 GO9F 1.787 
8 O1Z -0.648 59 GO9IG -0.052 
9 O2Z 0.612 60 RO1Z -0.454 
20 03Z -1.509 61 RO2Z 1.033 
21 O5Z 1.456 62 RO3Z -0.382 
22 O6Z 0.529 63 ROAZ 0.166 
23 O7A -1.481 64 ROSA -0.201 0.729 -0.875 
24 O7B 0.197 0.108 65 RO6A -0.456 
25 O7C -0.374 -0.015 66 RO6B 0.631 
26 O7D -0.743 0.245 67 RO7Z 0.437 
27 O7E -0.168 0.079 68 RO8Z 0.330 
28 O7F 0.300 0.231 69 ROYZ -0.798 
29 O7G 0.945 0.421 70 R10Z 0.647 
30 O7H -0.128 0.344 71 R11A -1.305 2.717 
31 O7 0.291 72 R11B -1.283 
32 O7J 1.615 73 R11C -1.226 
33 S01Z “1.971 74 R11D -0.029 
34 SO2Z 2.665 75 R12A 2.088 
35 SO03Z 0.320 76 R12B 0.502 
36 SO4A 1.188 77 R12C -0.708 
37 SO4B -1.006 78 R12D 0.821 
38 S06Z 0.088 79 R12G 2.044 
39 SO7Z 2.613 80 R12H 0.402 0.150 
40 SO8A -0.509 81 R12 1.325 
At SO8B -0.027 82 R12) 1.870 
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Table 11.11: Final parameter estimates of the CT items 


Item no. Item name Item parameter Step 1 Step 2 
41 TAO1Z -0,.234 0.485 
2 TAQ2Z -1.109 0.317 
3 TAOQ3Z -0.646 1.157 
4 TAO4Z 0.733 
5 TAOQ5Z 0:233 -0.047 
6 TAO6Z 0.017 0.225 
7 TAO7Z 1.105 0.711 
8 TAO8Z 0.710 -0.377 
9 TFO1E -1.205 #1352 -0.030 
10 TFO2L -1.587 0.818 
11 FOSTE -0.278 -0.466 -1.546 
12 FO4TE -0.578 0.031 -1.497 
13 FO5TE -0.132 0.921 0.454 
14 FO6TE 0.635 “01757 -0.585 
15 FO7TE 0.544 -0.429 0.603 
16 FO8TEC 0.700 2.042 
A? FO9 1.090 1.663 
Item parameters for CT were obtained from a data file that included response data from all seven 
countries that met the sampling requirements for ICILS 2018. 
Missing student responses that were likely to be due to problems with test length (“not reached 


items”) were excluded from the calibration of item parameters, but included and treated as 
“incorrect” when scaling the student responses. Items for which technical failures occurred were 
treated as not administered. Omitted items were treated as incorrect at both stages. 


From this, we identified a set of item parameters that we used to scale the ICILS 2018 CIL data 
(Table 11.10). The final set of CT item parameters is displayed in Table 11.11. 


The overall test reliabilities for the two cognitive assessments following the removal of items with 
non-satisfactory psychometric properties, based on the pooled datasets and obtained from the 
scaling model, were 0.97 (CIL) and 0.84 (CT) (ACER ConQuest 4.0 estimate). 


CIL and CT estimates 


The accuracy of measuring the latent ability @ at the individual level can be improved by using a 
larger number of test items. However, in large-scale surveys such as ICILS, the purpose is to obtain 
accurate population estimates through use of instruments that also cover a wider range of possible 
aspects of cognitive abilities. 


The use of amatrix-sampling design, where individual students are allocated modules inasystematic 
way and respond to a set of items obtained from the main pool of items, has become standard in 
assessments of this type (see Chapter 2). However, reducing test length and administering subsets 
of items to individual students introduces a considerable degree of uncertainty at the individual 
level. Aggregated student abilities of this type can lead to bias in population estimates. This problem 
can be addressed by essentially treating a student's ability estimate as a missing data problem and 
employing plausible value methodology that uses all available information from student tests and 
questionnaires to impute an ability estimate, a process that leads to more accurate population as 
well as sub-group estimates (Mislevy 1991; Mislevy and Sheehan 1987; von Davier et al. 2009). 
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Using item parameters anchored at their estimated values from the calibration sample makes it 
possible to randomly draw plausible values from the marginal posterior of the latent distribution for 
each individual. Estimations are based on the conditional item response model and the population 
model, which includes the regression on background variables and between-school differences 
used for conditioning. (For a detailed description see Adams et al. 1997; also Adams 2002.) In 
order to obtain estimates of students’ CIL and CT, we used the ACER ConQuest 4.0 software to 
draw plausible values. 


All available international student questionnaire variables were used for conditioning on 
background information on students. To take missing responses into account, all missing values 
in a variable were substituted with either the mode or the mean and corresponding indicators 
for the occurrence of missing values were added as additional variables. Appendix D lists all the 
international student-level variables (along with their respective scoring) that were used in the 
conditioning of plausible values for CIL and CT. 


Because of the large number of variables, principal component analyses (PCA) were used to reduce 
the number of student-level variables as conditioning variables so that these reflected 99 percent 
of the variance in the original variables. At the student level, only gender and its corresponding 
missing indicator were included as direct conditioning variables. To account for between-school 
differences, stratum indicators and the average of (weighted likelihood) ability estimates for all 
other students in the same school were also introduced as direct conditioning variables. 


For ICILS 2013, the CIL scale was originally established and transformed to a metric with a mean 
of 500 and a standard deviation of 100 for equally weighted ICILS 2013 countries that had 
met sampling requirements (categories 1 and 2), also excluding benchmarking participants (see 
Gebhardt and Schulz 2015). This linear transformation was computed by applying the formula: 


6), = 500 +100 ar Hecis3) 
n(CIL) + 


Ocici3) 


where 6,,c;,, were the student scores in the international metric, 8, were the original logit scores, 
Ugciis) Was the average of students’ CIL logit scores (-0.119) for a pooled dataset with equally 
weighted national ICILS 2013 samples from countries that had met IEA sample participation 
requirements, and Ggc;3) was the corresponding standard deviation (1.186). This transformation 
was applied to each of the five plausible values reflecting CIL derived for ICILS 2013. The equating 
of ClL scores derived in ICILS 2018 will be described in the next section. 


To transform the original CT score in ICILS 2018 to anew reporting metric for this cycle, a similar 
transformation was applied as for establishing the CT scale metric: 


0’. = 500 +100 tee Hecr13) 
n(CT) 


Snacris) 


Here, Oc) represents the student CT scores in the international metric, 6,;c) the original logit 
SCOFES, Mgcriz the average of students’ CT logit scores (-0.1490) for a pooled dataset with seven 
equally weighted national ICILS 2018 samples from countries that had participated in the 
international option and met IEA sample participation requirements, and Ongcri3) the corresponding 
standard deviation (0.9702). 
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Equating CIL scores from ICILS 2013 and 2018 


To achieve the transformation of |CILS 2018 ClLscores to the scale originally established with ICILS 
2013 data, it was necessary to equate the new scale scores. As mentioned earlier, all ICILS 2013 
item parameters were re-estimated concurrently during the ICILS 2018 joint calibration process. 


Before joining the data from the two assessment cycles, we reviewed the relative difficulties of the 
common items to evaluate the quality of the link. In total there were 46 items common between 
the two ICILS assessment cycles. To compare relative difficulties of the 46 common items between 
the two assessments, all ICILS 2018 items were also calibrated separately using the data from 
11 countries (excluding two benchmarking entities and the data from the United States as these 
participants did not meet the sampling requirements). These item difficulties were compared with 
the item difficulties estimated from the calibration sample in ICILS 2013. 


The difference in relative difficulty for each item between the values estimated in 2013 and 2018 
calibrations was used to assess the quality of the items as a link. Differences were expected to be 
zero. Adifference of more than half a logit was considered to be large and would result in breaking 
the link of that item. After a careful consideration of results of various comparisons, a final set of 
36 link items was selected. The remaining 10 items common between two cycles were kept in the 
joint calibration dataset but were treated as separate items. 


Figure 11.8: Relative item difficulties for CIL common items in 2013 and 2018 
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To construct ICILS 2018 scale, the data from 11 countries that met the sampling requirement 
in 2018 were merged with the data from three trend countries collected in the ICILS 2013 
assessment. A concurrent calibration was performed regressing on the cycle to estimate the 
parameters of all iterns used over the two assessment cycles. In total the item difficulties were 
estimated for 111 items. These included all 82 items used in ICILS 2018, 19 items used only in 
ICILS 2013, and an additional 10 items that were un-linked and thus estimated for 2013 and 2018 
separately (no overlap of the data in the combined dataset). 


For equating purposes, the new item parameter estimates were used to redraw plausible values 
for the ICILS 2013 sample, using full conditioning. We only included the three countries that 
met the sampling requirements in both cycles. Subsequently, we computed the pooled mean and 
standard deviation of the plausible values on the 2013 scale and on the 2018 scale. Comparing 
those distributions resulting in the following linear transformation to equate the ICILS 2018 
student abilities onto the historical ICILS 2013 scale in logits: 


OF = 1.0258 x 6° + 0.2127 
These equated plausible values were subsequently placed onthe ICILS 2018 international reporting 


scale by applying the same transformation as in ICILS 2013: 


@, - (-0.1188) 
1.1859 


6; = 500 +100 


Because the transformation equating the ICILS 2018 data with the ICILS 2013 data depended on 
the change in the degree of difficulty of each of the individual link items, the sample of link items 
chosen influenced the choice of transformation. This meant that the resulting transformation 
would have been slightly different if we had chosen an alternative set of link items. Uncertainty in 
the transformation thus relates to the sampling of the link items, in the same way that uncertainty 
in values such as country averages is an outcome of the particular sample of students that is used. 


The uncertainty resulting from link-item sampling is referred to as linking error, and it is an error 
that analysts have to take into account when comparing the results arising out of different data 
collections (see Monseur and Berezner 2007). As is the situation with the error that is introduced 
through the process of sampling students, the exact magnitude of this linking error cannot be 
determined. We can, however, estimate the likely range of magnitudes for this error and take it 
into account when interpreting results. As with sampling errors, the likely range of magnitude for 
the errors is represented as a standard error. 


The following approach has been used to estimate the equating error. Suppose we have a total of 
| score points in the link items in K modules. Use i to index items in a unit and j to index units so 
that 5) is the estimated difficulty of item iin unit j for year y, and let: 


_— § 2018 ¢ 2013 
C= 85 — 85 


The size (number of score points) of unit j is m, So that: 


ak 
zmat and mazda M, 


Further let: 
= le st cat 5k ym 
a m. 2 20 and oN Dead “i 


J 


and then the link error, taking into account the clustering of items was computed as follows: 


K 2 = K 2 = 
dM, (Cie Cc)? - DM (c.- c)? K 


\¥ K(K-1)m Le K-1 


LinkErrorzoi¢, 2019= 
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The development of proficiency levels for CIL 


One of the objectives of |CILS was to establish a described ClL scale that would become a reference 
point for future international assessments in this learning area. Establishing proficiency levels 
of CIL is an informative way of describing student performance across countries and also sets 
enchmarks for future surveys. 


emonstrate certain understandings and skills that are associated with that level. These students 


b 
Students whose results are located within a particular level of proficiency are typically able to 
d 
also typically possess the understandings and skills defined as applying at lower proficiency levels. 


When developing proficiency levels, a method was applied that ensured that the notion of “being 
t alevel” could be interpreted consistently and in line with the fact that the achievement scale is 
a continuum. It was therefore attempted to provide a common understanding about what being 
at alevel meant and to ensure that this meaning was consistent across different proficiency levels. 
This method took the following three questions into account: 


w 


e What is the expected success of astudent at a particular level on a test containing items at that 
level? 


e What is the width of the levels in that scale? 


e What is the probability that a student in the middle of a level will correctly answer an item of 
average difficulty for that level? 


We adopted the following two parameters for defining proficiency levels to create the properties 
described below: 


e The response probability (rp) for reporting item parameters: this was set at rp = 0.62 (providing a 
more appropriate level of “mastery” than rp = 0.5). 


e The width of the proficiency levels: this was set at 0.8 logits (i.e., the original latent trait scores 
from the IRT scaling model prior to their transformation to the reporting metric). 


Using these parameters, we were able to infer the following about students’ ClL in relation to the 
proficiency levels: 


e Students whose results placed them at the lowest possible point of a proficiency level were 
ikely to correctly answer (on average) slightly over 50 percent of the items on a (hypothetical) 
test made up of items with locations spread uniformly across the level. 


e Students whose results placed them at the lowest possible point of a proficiency level had a 62 
percent probability of giving the correct response to an item at the bottom end of the proficiency 
evel. 


e Students whose results placed them at the top of the proficiency level had a 78 percent 
probability of correctly responding to an item at the bottom end of the proficiency level. 


The approach that was chosen was essentially an attempt to apply an appropriate choice of mastery 
by placing item locations at rp = 0.62 while simultaneously ensuring that the approach would be 
understood by the readers of ICILS reports. 


The international research team identified and described four proficiency levels that could be 
used when reporting student performances in CIL from the assessment. Figure 11.9 shows the 
cut-points for these levels (in logits and final scale scores). The figure also cites the percentage of 
students at each proficiency level across the participating ICILS countries. 


SCALING PROCEDURES FOR ICILS 2018 TEST ITEMS 


Figure 11.9: CIL proficiency level cut-points and percentage of students at each level 


rp = 0.62 evena CliL scale 
P 2% 
7 Level 3 661 
19% 
0.3 576 


Level 2 
36% 

Level 1 eet 
25% 

Below 1 
18% 


When reporting released CIL items and mapping them against proficiency levels, we had to 
transform location parameters of these items to a value that reflected a response probability of 
62 percent (rp = 0.62). This is achieved by adding the natural log of the odds of 62 percent chance 
to the original log odds and transforming the result to the international metric by applying the 
same transformation as for the (original) student scores. The standardized item difficulty 6; for 
each CIL item was obtained as follows: 
(1.0258 x 8 + 0.2127) - In hee 
0.38 


8 = 500+ 100 H @(ciL13) 


U @CIL13) 


Here, 6, is the item difficulty in its original metric, U @ci.13) is the ICILS 2013 average of students’ 
CIL logit scores (-0.119) and H ci. 13) is its corresponding standard deviation (1.186) that were 
used to standardize the plausible values. As the CIL item difficulty parameters (6) were calibrated 
based onthe combined set of old and newitems, the same transformation as for student CIL scores 
had to be applied before transforming them to the CIL reporting metric at rp = 0.62. 
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CHAPTER 12: 


Scaling procedures for ICILS 2018 
questionnaire items 


Wolfram Schulz and Tim Friedman 


Introduction 


This chapter describes the procedures for the scaling of the ICILS 2018 questionnaire data 
(collected from students, teachers, ICT coordinators, and school principals) and the indices that 
were derived from these data. 


When describing the ICILS 2018 questionnaire indices, we distinguish the following two general 
types of indices: 
e Simple indices were constructed through arithmetical transformation or recoding, for example 
an index of immigration background based on information about the country of birth of students 
and their parents; 


e Scale indices that were derived through the scaling of items, this was typically achieved by using 
item response modeling of items with two or more categories. 


The chapter is divided into three main sections. The first section lists the simple indices that were 
derived from ICILS 2018 data and describes how they were created. The second section outlines 
the procedures used for the scaling of questionnaire data in ICILS 2018 followed by the third 
section which lists the scaled indices with statistical information on the factor structure of related 
item sets, scale reliabilities, and parameters used for the item response theory (IRT) scaling. 


Results from an analysis of cross-country validity of item dimensionality and constructs were 
already part of ICILS 2018 field trial analyses. The international study center at ACER conducted 
reviews of the extent to which measurement models were equivalent across participating countries 
for draft item material. To conduct this review we made use of exploratory as well as confirmatory 
factor analysis and item response modeling to examine cross-national measurement equivalence 
before the final selection of main survey questionnaire items (see examples of this type of analysis 
in Schulz 2009, 2017). 


Simple indices 


Student questionnaire 


Student age (S_AGE) was calculated as the difference between the year and month of the testing 
andthe year and month of astudent’s birth. Information from the student questionnaire (Question 
1) was used to derive age, except for students where this information was missing. In these cases 
information from student tracking forms (see Chapter 10 for more details) provided data for the 
calculation of this index. 


The formula for computing S_AGE was: 


(Ty ~ Sia) 
SAGE = (T,-5,)+— 


where T,, and S, are, respectively, the year of the test and the year of birth of the tested student, in 
four-digit format (e.g., “2018” or “2005”), and where T,, and S,, are respectively the month of the 
test and the month of the student’s birth. The result was rounded to two decimal places. 


In Question 2, students were asked about their gender. These were recorded as the sex of student 
(S_SEX), a girl (1) or a boy (0). For students with omitted data for this question, the gender from 
the tracking form was included. 
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Question 3 asked students about their expected highest level of educational attainment. The 


responses were classified using the International Standard Classification of 
framework (UNESCO 2012) The corresponding index students’ expected education (S_ISC 
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1 Most countries collected more detailed information on language use. This information is not included in the international 


database. 


2 Students could complete parental occupation questions for up to two parents. 
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and S_P2ISCO for the students’ parents. We then mapped these codes to the internationa 
socioeconomic index of occupational status (ISEI) (Ganzeboom et al. 1992). The three indices that 
we obtained from these scores were occupational status of the first parent (S_P1ISEI), occupationa 
status of the second parent (S_P2ISEI), and the highest occupational status of both parents 
(S_HISEI), with the latter corresponding to the higher ISEI score of either parent or to the 
only available parent's ISEI score. For all three indices, higher scores indicate higher levels of 


occupational status. 


Questions 9 and 13 asked about the highest parental education attainment for the students’ 
parents and provided the data for measuring another important family background variable. The 
core difficulties with this variable relate to international comparability (education systems differ 
widely across countries and over time within countries) and response validity (students are often 
unable to accurately report their parents’ levels of education). Levels of parental education were 
classified according to the ISCED levels. 


Recoding educational qualifications into the following categories provided indices of highest 
parental educational attainment: 


(0) Did not complete ISCED level 2; 

(1) ISCED level 2 (lower-secondary); 

(2) ISCED level 3 (upper-secondary); 

(3) ISCED level 4 (non-tertiary post-secondary) or ISCED level 5 (vocational tertiary); and 

(4) ISCED level 6 (bachelor’s level tertiary) or ISCED level 7 (master’s level tertiary) or ISCED 
evel 8 (doctorate level tertiary). 


Indices with these categories are available for each student’s first parent (S_P1ISCED) and second 
parent (S_P2ISCED). The index for highest educational level of parental education (S_HISCED) 
corresponds to the higher ISCED level of either parent. 


Question 14 of the ICILS student questionnaire asked students how many books they had in their 
homes. Responses to this question formed the basis for an index of students’ home literacy resources 
(S_HOMLIT) with the following categories: 


(0) Oto 10 books; 
(1) 11 to 25 books; 
(2) 26to 100 books; 
(3) 
(4) 


101 to 200 books; and 
More than 200 books. 


Question 15B of the ICILS student questionnaire was an international option question where 
students were asked if they have an internet connection at home. The index (S_INTNET) was recorded 
as Yes (1) or No (0). 


In Question 16, students were asked their number of years of experience in using different types 
of ICT devices. The three types of devices included: 


e Desktop or laptop computers; 
e Tablet devices or e-readers (e.g., iPad, Tablet PC, Kindle); and 


e Smartphones except for using text and calling. 


Responses to these items were coded into three respective indices: computer experience in years 
(S_EXCOMP), tablet experience in years (S_EXTAB), and smartphone experience in years (S_ EXSMART) 
with the following response options available: 


(O) Never or less than one year; 
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t less than three years; 


but less than five years; 


ut less than seven years; and 


tudent questionnaire, Question 30, asked students whether they studied 


acomputing subject (computing, computer science, information technology, informatics or similar) 


during the current sch 


schoo 


Teacher questionnaire 


| year (S_ICTSTUD) 


The sex of teacher (T_SEX 
questionnaire. Female teaches were coded as 1, whereas male teachers were coded as O. 
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) was computed from the data captured from Question 1 of the teacher 


isted of the midpoint of the age ranges given in Question 2 of the teacher 


ionnaire. We assigned “less than 25” a value of 23 and coded “60 or over” as 63. 


ion 4.asked teachers to indicate the number of schools in which they teach the target grade 
ation for the current school year. Data captured from this question were used to calculate 


staff time to sampled school (T_WGT) and was coded as: 
ed school; 

hool and another school; 

hool and in two other schools; and 


hool and in three or more other schools. 


tion 5 of the teacher questionnaire asked teachers to indicate how long they have been using 


purposes. They were asked to distinguish between ICT experience with 


EXLES) and ICT experience with ICT use for preparing lessons (T_EXPREP). 


These indices were coded as: 


O) Never; 


School questionnaires 


y 


( 

(1) Less than two years; 

(2) Between two and five years; and 
( 


3) More than five years. 


he first question of the principal questionnaire asked whether respondents were male or female. 
his was used to form the index sex of principal (P_SEX) where female principals were coded as 1, 
and male principals coded as O. 


e ICILS school principal questionnaire asked in Question 3 about the number of girls and boys 
inthe entire school (IP2GO3A, IP2GO3B) and in Question 4 also about the enrolled girls and boys 
at the target grade (IP2GO4A, |P2GO4B). The numbers given for each gender group were summed 


to form an index of the number of students in the entire school (P_LNUMSTD) and of the number of 
students in the target grade (P_NUMTAR).° 


Question 5 also asked principals to report the lowest (youngest) (IP2GO5A) grade and the highest 
(oldest) (IP2GO5B) grade taught in their school. The difference between these two grades was 


calculated as the number of grades in school (P_NGRADE). 


3 These indices will be included in the ICILS 2018 Restricted Use Data flle, while the ICILS 2018 Public Use Data flle will 
include these variables in a categorized form as PNUMSTD_CAT (1 = 1-300, 2 = 301-600, 3 = 601-900, 4 = more than 
900) and PNNUMTAR_CAT (1 = 1-100, 2 = 101-200, 3 = more than 200). 
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Question 6 collected the information on the number of teachers (P_NUMTCH) in a school. The 
index was calculated by summing the total number of full time teachers (IP2GO6A) with the total 
number of part-time teachers weighted at 50 percent (0.5 x IP2GO6B).* The ratio of school size 
and teachers (P_RATTCH) was calculated by dividing the number of teachers (P_NUMTCH) by the 
number of students (P_NUMSTD) in aschool. 


Question 8A collected information about whether the school was a public school or private school. 
This was used to form a private school indicator (P_PRIV) where public schools are coded as O and 
private schools coded as 1.° 


Question 8B asked principals to estimate the percentage of students at their school coming 
from economically affluent homes and those from economically disadvantaged homes (“O-10%,” 
“11-25%? “26-50%, and “more than 50%” for each home type). We used the responses to 
compute an indicator of school composition by student background (P_COMP) where a value of 1 was 
assigned to “schools with more affluent than disadvantaged students,’ 2 to “schools with neither 
more affluent nor more disadvantaged students,” and 3 to “schools with more disadvantaged than 
affluent students.’ 


Question 3 of the ICT coordinator questionnaire asked respondents to indicate the ICT experience 
in years in the school (C_EXP). These were recoded as: 


(O) Never, we do not use ICT; 


be 


Fewer than 5 years; 


N 


( 
(2) Atleast 5 but fewer than 10 years; and 
( 


ee) 


10 years or more. 


Question 7 of the ICT coordinator questionnaire collected data on the number of desktop 
computers, the number of laptop/notebooks, and the number of tablet devices in the school. The 
sum of ICT devices (C_ICTDEV) was derived by adding these numbers. Respondents were also 
asked to indicate the number of these different types of devices that were available for student use. 
These were summed to derive an index of sum of ICT devices available for student use (C_ICTSTD). 
The question also asked respondents to indicate the number of school-provided smart boards or 
interactive white boards available in the school. In conjunction with the number of students at 
school (P_NUMSTD) these data also provided the following ratios: 


e Ratio of school size and number of ICT devices (C_RATDEV) = Number of students in the school 
P_NUMSTD) / number of ICT devices in the school altogether (C_ICTDEV). 

e Ratio of school size and number of devices available for students (C_RATSTD) = Number of students 
in the school (P_NUMSTD) / number of ICT devices in the school that are available to students 
C_ICTSTD). 
e Ratio of school size and smart boards (C_RATSMB) = Number of students in the school (P_ 
UMSTD) / number of smart boards or interactive white boards available (II2GO7C). 


4 This index will be included in the ICILS 2018 Restricted Use Data, while the ICILS 2018 Public Use Data will include this 
variable in a categorized form as PNNUMTCH_CAT (1 = 1-25, 2 = 26-50, 3 = 51-75, 4 = more than 75). 
5 This index will only be included in the ICILS 2018 Restricted Use Data. 
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As part of the scaling analyses of questionnaire data, we reviewed reliabilities both overall and for 
national samples using Cronbach's alpha coefficient as an estimate of the internal consistency of 
each scale (Cronbach 1951). When reviewing reliabilities we regarded values above 0.7 as indicating 
satisfactory internal consistency and those above 0.8 as showing high degrees of reliability (see, for 


example, Nunnally and Be 


rnstein 1994, pp. 264-265). Apart from scale reliabilities, this analysis 


stage also considered the percentages of missing responses (which tended to be very low in most 


cases) as well as the correl 
items in a scale (adjusted i 


Confirmatory factor ana 


Structural equation mode 


data. At the field trial sta 
structures.© When using 


the SEM framework, laten 


X=A,E+86 


ations between individual items and the scale scores based on all other 
tem-total correlations). 


lysis 


ing (SEM) (Kaplan 2009) provides a tool for modeling and confirming 


theoretically expected dimensions measured with sets of student, teacher, or school questionnaire 


ge, it can also be used to re-specify originally expected dimensional 
confirmatory factor analysis, researchers acknowledge the need to 


employ atheoretical model of item dimensionality that can be tested via the collected data. Within 


t variables link to observable variables via measurement equations. An 


observed variable x is thus modeled as: 


where A, is aqxk matrix of factor loadings, § denotes the latent variable(s), and dis aq x 1 vector 


of unique error variables. 
factor structure. 


When conducting the con 


model-fit indices provided measures of the extent to which a particular model with an assumed 
a-priori structure had a good fit with regard to the observed data. For the ICILS 2018 analysis, the 
assessment of model fit was primarily conducted through reviews of the root-mean square error of 
approximation (RMSEA), the comparative fit index (CF), and the non-normed fit index (NNFI), all of 


which are less affected th 
Long 1993). 


The expected covariance matrix is fitted according to the theoretica 


firmatory factor analyses for ICILS 2018 questionnaire data, selected 


an other indices by sample size and model complexity (see Bollen and 


We interpreted RMSEA values indicating model fit as unacceptable with values over 0.410, as 
marginally satisfactory with values between 0.08 and 0.10, as satisfactory between 0.05 and 0.08, 
and as aclose fit with values lower than 0.05 (see MacCallum et al. 1996). As additional fit indices, 
CFI and NNFI are bound between O and 1. Values below 0.90 indicate a non-satisfactory model 


fit whereas values greater 
and Bonnet 1980; Hu and 


the extent to which each 


proportion of unexplained 
8 = (1-2) 


than 0.95 were interpreted as suggesting aclose model fit (see Bentler 
Bentler 1999). 


n addition to these fit indices, reviews of standardized factor loadings and the corresponding 
residual item variances provide further evidence of model fit for questionnaire data. Standardized 
factor loadings 4’ can be interpreted in the same way as standardized regression coefficients by 
assuming that indicator variables are regressed on an underlying latent factor. The loadings reflect 


indicator measures the underlying construct. Squared standardized 


factor loadings indicate how much variance in an indicator variable can be explained by the latent 
factor and are related to the (standardized) residual variance estimate 6’ (reflecting the estimated 


variance) as: 


6 Inthe initial stages of field tri 


al analyses, we also employed exploratory factor analysis (Tucker and MacCallum 1997; 


Fabrigar et al. 1999) to determine item dimensionality of larger item pools. 
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The use of multidimensional models also allows an assessment of the estimated correlation(s) 
between latent factors, which provide(s) information on the similarity of the different dimensions 
easured by related item sets. 


m 
Generally, maximum likelihood estimation and covariance matrices are not appropriate for analyses 
of (categorical) questionnaire items because the approach treats items as if they were continuous. 
Therefore, the analyses of ICILS 2018 relied on robust weighted least squares estimation (see 
Muthén et al. 1997; Flora and Curran 2004) to estimate the confirmatory factor models. The 
software package used for the estimations was MPLUS 7 (Muthén and Muthén 2012). 


Confirmatory factor analyses were carried out for sets of conceptually related questionnaire items 
that measured between one or more different dimensions. This approach allowed an assessment of 
the measurement model as well as of the associations between related latent factors. The scaling 
analyses were restricted to data from those countries that met sample participation requirements 
(see Chapter 8 for further information). National samples of students, teachers, and schools 
received weights that ensured equal representations of countries in the analyses. 


Ininternational studies, model parameters may vary across country and it may not be appropriate 
to assume the same factor structure for each population. To test parameter invariance, multiple- 
group modeling, as an extension of CFA, offers an approach to test the equivalence of measurement 
models across sub-samples (Little 1997; Byrne 2008). 


When considering a model where respondents belong to different groups indexed as g = 1, 2,..., 
G, the multiple-group factor model becomes 


x =A € +8 


8 x8 ~3 3 


Atest of factorial invariance (H,) where factor loadings are defined as being equal (often referred 
to as “metric equivalence’) (Horn and McArdle 1992) can be defined as 


Hy: Ay =A, =A5=..= A, 


Typically, model-fit indices are compared across different multiple-group models, each with an 
increasing degree of constraints; from relaxed models with no constraints through to constrained 
models with largely invariant model parameters. 


In this report, for all student and teacher questionnaire scales,’ three different multiple-group 

models for CFA were estimated with different levels of constraints on parameters: 

A. Unconstrained models where all parameters are estimated as country-specific (configural 
invariance); 


WO 


Models with constrained factor loadings across countries (metric invariance); and 


C. Models with constraints on factor loadings and intercepts (scalar invariance). 


odels with confirmed scalar invariance are the only ones that ensure full comparability of 
measurement models across participating countries. When comparing model fit across the three 
conditions, it needs to be acknowledged that with data from large samples, as is typically the case 
n international large-scale assessments, even very small differences appear to be significant. 
This makes hypothesis testing using tests of statistical significance difficult and therefore, when 
reviewing results, it is more appropriate to focus on relative changes in model fit across the three 
models with different levels of constraints. 


7 For school questionnaire data, we did not conduct this type of analyses given the relatively small size of national 
samples. 
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Item response modeling 
In line with the scaling of test item data (see Chapter 11), item response modeling was used to 
scale questionnaire items. The one-parameter (Rasch) model (Rasch 1960) for dichotomous items 
models the probability of selecting an item category 1 instead of O as: 

ire exp(6,,- 6;) 

me" 41+ exp(6,- 6) 


where P; (6,) is the probability of person n scoring 1 on item i, 6, is the estimated latent trait of 
person n, and 6; is the estimated location of itemion this dimension. For each item, item responses 
are modeled as a function of the latent trait 6,. 


In the case of items with more than two categories (as, for example, with Likert-type items), this 
model can be generalized to the (Rasch) partial credit model (Masters and Wright 1997), which 
takes the form of: 

EXP Y j-9 (8,- 8; +7;) 
Y oeXP D Fig (On- 8; + 7) 


Py; (6,) = x = O,1,...,; 


where Px,(8,,) denotes the probability of person n scoring x on item i, 8, denotes the person’s latent 
trait, the item parameter 6; describes the location of the item on the latent continuum, and t; 
provides an additional step parameter. 


Weighted mean-square statistics (infit), statistics based on model residuals, were used in 
conjunction with a wide range of further item statistics to assess the fit of the IRT model. ICILS 
2018 used the ACER ConQuest software package (Adams et al. 2015) for the analysis of item 
scaling properties and the estimation of item parameters. 


The international item parameters were derived using equally weighted national datasets®: 


A _ Calibration of item parameters for the student questionnaire: This was done based on a pooled 
database with equally weighted national samples from 11 countries that met sample 
participation requirements for the student survey. 

B Calibration of item parameters for the teacher questionnaire: This was done based on a pooled 
database with equally weighted national samples from seven countries that met sample 
participation requirements for the teacher survey. 


C Calibration of item parameters for schoo! questionnaire: This was done based ona pooled database 
with equally weighted national samples from 11 countries that met sample participation 
requirements for the student survey. 


Following the estimation of international item parameter from the calibration sample, we computed 
weighted likelihood estimation to obtain individual student, teacher, or school scores. Weighted 
likelihood estimations are computed by minimizing the equation: 


k  exp(>”_ 6,- 6 +7;) 
2 tt ah i > m — ; os 0 
iEQ 2) Mt SOE (0,- 6;+ t;) 


for eachcasen, where r, isthe sum score obtained from aset of kitems withj steps between adjacent 
categories. This can be achieved by applying the Newton-Raphson method. The term J,,/2I,, (with 
[, being the information function for student n and J, being its derivative with respect to 6) is used 
as aweight function to account for the bias inherent in maximum likelihood estimation (see Warm 
1989). We used the ACER ConQuest software for the pre-calibration of item parameters in order 
to subsequently derive scale scores. 


8 Data from benchmarking participants were not included in the item parameter calibrations. 
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For all questionnaire scales in ICILS 2018 the transformation of weighted likelihood estimates to an 
international metric resulted in reporting scales with an ICILS 2018 average of 50 and a standard 
deviation of 10 for equally weighted datasets from the countries that met sample participation 
requirements. This is achieved by applying the following formula: 


6,,- “accuse 
O4(ICILS18) 


6; 50-+10| 


where 6, are the scores in the international metric, 6, are the original weighted likelihood estimates 
in logits, and Upccisisy is the international mean of logit scores with equally weighted country 
subsamples. 0, ici,s1g) is the corresponding international standard deviation of the original weighted 
likelihood estimates. 


Table 12.1 presents the means and standard deviations used to transform the original scale scores 
for the student, teacher, ICT coordinator, and school principal questionnaires into the international 
metric. 


Table 12.1: Transformation parameters for new ICILS 2018 questionnaire scales (means and standard 
deviations of original IRT logit scores) 
International student questionnaire Teacher questionnaire 
Scale Mean SD Scale Mean SD 
S_GENACT -0.63 149 T_CLASACT -0.47 1.88 
S_SPECACT -0.89 0.97 T_CODEMP 0.36 1.83 
S_USECOM 0.43 0.88 COLIC 1.00 2.59 
S_USEINF -1.23 0.97 _ICTEFF 2.32 1.47 
S_ACCONT 0.26 1.05 _ICTEMP 1.00 1.94 
S_USESTD -0.51 1.03 _ICTPRAC -0.56 1.80 
S_SPECLASS -2.3 1.69 T_PROFREC -0.41 19 
S_GENCLASS -0.97 1.95 _PROFSTR -1.16 155 
S_ICTLRN 0.66 1.69 T_RESRC 0.19 1.90 
S_GENEFF 1.86 1.36 T_USETOOL -1.85 1.43 
S_SPECEFF -0.1 1.61 T_USEUTIL -0.23 1.47 
S_ICTPOS 1.61 1.79 TVWNEG 0.08 137 
S_ICTNEG 0.66 1.26 T_VWPOS 1.78 2.13 
S_ICTFUT 0.25 2.00 School principal questionnaire 
S CODLRN 0.25 1.85 Scale Mean SD 
ICT coordinator questionnaire P_EXPLR 0.90 1.90 
Scale Mean SD P_EXPTCH 1.32 2.33 
C_HINPED 0.23 141 P_ICTCO 0.97 1.47 
C_HINRES -0.35 1.48 P_ICTUSE 0.05 0.94 
P_PRIORH 1.54 1.78 
P_PRIORS 52 1.64 
P VWICT 3.63 2.24 
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Describing questionnaire scale indices 


Questionnaire scales derived from weighted likelihood estimates (logits) present values on a 
continuum with an ICILS 2018 average of 50 and a standard deviation of 10 (for equally weighted 
national samples). This allows an interpretation of these scores by comparing individual scores or 
group average scores with the ICILS 2018 average but the individual scores do not reveal anything 
about the actual item responses and the extent to which respondents endorsed the items used 
to measure the latent variable. The scaling model used to derive individual scores allows the 
development of descriptions of these scales through a mapping of scale scores to (expected) item 
responses.’ 


Itis possible to describe item characteristics by using the parameters of the partial credit model to 
provide an estimate for each category of its probability of being chosen relative to the probabilities 
of all higher categories. This process is equivalent to computing the odds of scoring higher than 
a particular category. 


Figure 12.1 presents the results of plotting these cumulative probabilities against scale scores for 
a fictitious item, where respondents rate their agreement or a disagreement to a statement on 
a four-point scale. The three vertical lines denote those points on the latent continuum where it 
becomes more likely to score > O, > 1, or > 2. These locations T, are Thurstonian thresholds which 
can be obtained through an iterative procedure that calculates summed probabilities for each 
category at each (decimal) point on the latent variable. 


Figure 12.1: Summed category probabilities for fictitious item 


1.0 


Probabilty 


0.0 
-4.00 -3.00 -2.00 -1.0d 0.00 1.00 2100 3.00 4.00 
THETA s 2 3 
v v Vv 
OOO = =—r Cd 
Strongly disagree Disagree Agree Strongly agree 


9 This approach was also used in the IEA ICCS 2009 and 2016 surveys (see Schulz and Friedman 2011, 2018) and the 
ICILS 2013 survey (see Schulz and Friedman 2015). 
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Summed probabilities are not identical to expected item scores and have to be understood in 
terms of the probability of scoring at least a particular category.’° Thurstonian thresholds can be 
used to indicate for each item category those points on a scale at which respondents have a 0.5 
probability of scoring in this category or higher. For example, in the case of Likert-type items with 
the categories strongly disagree, disagree, agree and strongly agree, we can determine at what point of 
ascale a respondent has a 50 percent likelihood of agreeing (or strongly agreeing) with the item. 


The item-by-score maps included in ICILS 2018 reports predict the minimum coded score (e.g., 
O = “strongly disagree,’ 1 = “disagree,” 2 = “agree,” and 3 = “strongly agree”) a respondent would 
obtain ona Likert-type item. For example, we could predict that students with a certain scale score 
would have a 50 percent probability of agreeing (or strongly agreeing) with a particular item (see 
the example item-by-score map in Figure 12.2). For each item, it is thus possible to determine 
Thurstonian thresholds: the points at which a minimum item score becomes more likely than any 
lower score to occur and which determine the boundaries between item categories on the item- 
by-score map. 


Figure 12.2: Example of questionnaire item-by-score map 


Scale scores (mean = 50, standard deviation = 10) 
Scores 


oe 20 30 40 50 60 70 80 


Item #1 


Item #3 


Strongly disagree Disagree BB Agree I Strongly agree 


Example of how to interpret the item-by-score map 

#1: Arespondent with score 30 has more than 50% probability to strongly disagree with all three 
items 

#2: Arespondent with score 40 has more than 50% probability not to strongly disagree with 
items 1 and 2 but to strongly disagree with item 3 

#3: Arespondent with score 50 has more than 50% probability to agree with items 1 and to 
disagree with items 2 and 3 

#4: Arespondent with score 60 has more than 50% probability to strongly agree with items 1 
and to at least agree with items 2 and 3 


#5: Arespondent with score 70 has more than 50% probability to strongly agree with items 1, 2, 
and 3 


10 Other ways of describing item characteristics based on the partial credit model are item characteristic curves, which 
involve plotting the individual category probabilities and the expected item score curves (for a detailed description, see 
Masters and Wright 1997). 
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This information can also be summarized by calculating the average thresholds across all items in 
ascale. For example, it is possible to do this for the second threshold of a four-point Likert-type 
scale, which allows the prediction of how likely it would be for a respondent with a certain scale 
score to have responses in the two lower or upper categories (on average across items). For ICILS 
2018 we used this approach in the case of iterms measuring agreement to distinguish between 
scale scores for respondents who were most likely to agree or disagree with the “average item” 
used for measuring the respective latent trait. 


Inthe reporting tables for questionnaire scales, we depicted national average student or teacher 
scale scores as boxes that indicated their mean values plus/minus sampling error and that were 
set in graphical displays featuring two underlying colors. National average scores located in the 
darker shaded area indicated that, on average across items, student or teacher responses had 
resided in the lower item categories (“disagree or strongly disagree” or “less than once a week’). If 
these scores were found in the lighter shaded area, however, then students’ or teachers’ average 
item responses would have been in the upper item response categories (“agree or strongly agree” 
or “at least once a week’). 


+ 


Scaled indices 


Student questionnaire 

National index of students’ socioeconomic background 

The multivariate analyses presented in the international report (Fraillon et al. 2020) include a 
composite index reflecting students’ socioeconomic background. The national index of students’ 
socioeconomic background (S_NISB) was derived from the following three indices: highest 
occupational status of parents (S_HISEI), highest educational level of parents (S_HISCED), and 
the number of books at home (S_HOMLIT). For the S_HISCED index, we collapsed the lowest two 
categories to have an indicator variable with four categories: lower-secondary or below, upper- 
secondary, tertiary non-university, and university education. The S HOMLIT index was reduced 
from five to four categories (O to 10 books; 11 to 25 books; 26 to 100 books; more than 100 books 
collapsing the two highest categories. This was done for both indices on parental education and 
home literacy as prior analyses had shown approximately linear associations across these categories 
with CIL and CT scores as well as other indicators of socioeconomic background. 


In order to impute values for students who had missing data for one of the three indicators, we 
used predicted values plus arandom component based on aregression on the other two variables 
that had been estimated for students with values on all three variables. This imputation procedure 
was Carried out for each national sample separately. 


After converting the resulting variables including the imputed values into z-standardized variables 
(with a mean of O and a standard deviation of 1 for each national dataset), principal component 
analysis of these indicator variables were conducted separately for each weighted national sample. 


The final S NISB scores consists of factor scores for the first principal component with national 
averages of O and national standard deviations of 1. Table 12.2 shows the factor loadings and 
reliabilities for each national sample. 


Students’ use of ICT for different activities 

Question 19 of the student questionnaire asked students to indicate their use of ICT for a range 
of different activities. For each of the eight activities, they were asked to select “never, “less than 
once amonth,’ “at least once amonth but not every week,” “at least once a week but not every day,’ 
or “every day.’ The items in this question were used to derive two scales. The first one reflected 
students’ use of general applications for activities (S_GENACT) as based on the first three items of the 
question and had an average reliability of 0.70 across the national samples, with Cronbach's alpha 
coefficients ranging from 0.63 to 0.84 (see Table 12.3). The second scale reflected students’ use of 
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Table 12.2: Factor loadings and reliabilities for the national index of students’ socioeconomic background 


Country Factor loadings for: Cronbach’s alpha 
Highest Highest Books 
parental parental at home 
occupation education 
Chile 0.90 0.88 0.70 0.77 
Denmark 0.79 0.79 0.72 0.65 
Finland 0.80 0.76 0.56 0.52 
France 0.80 0.78 0.69 0.63 
Germany 0.82 0.72 0.66 0.61 
taly 0.82 0.82 0.72 0.70 
azakhstan 0.78 0.78 0.57 0.51 
orea, Republic of 0.72 0.75 0.64 0.51 
Luxembourg 0.81 0.79 0.75 0.67 
Moscow (Russian Federation) 0.76 0.78 0.64 0.55 
North Rhine-Westphalia (Germany) 0.81 0.68 0.69 0.59 
Portugal 0.84 0.86 0.74 0.74 
United States 0.80 0.83 0.68 0.66 
Uruguay 0.82 0.82 0.75 0.71 
ICILS 2018 average 0.80 0.78 0.67 0.62 


Notes: Benchmarking participants in italics. The ICILS 2018 average is based on data from the participating countries, 
excluding benchmarking participants. 


specialist applications for activities (Ss SPECACT). It was derived from the remaining five items in the 
question with reliabilities ranging from 0.65 to 0.82, with an average of 0.73. The higher values on 
each scale reflect more frequent use of ICT for the corresponding activities. 


Figure 12.3 depicts the results from the confirmatory factor analysis of the scaled items from the 
question. The model had satisfactory fit for the two-factor model, and we found a high positive 
correlation between the two latent factors. When reviewing measurement invariance using 
multiple-group models with different constraints, the model fit changed only marginally which 
indicates a relatively high degree of invariance for this model. The reliabilities were satisfactory 
for all countries. The item parameters for both scales that were used to derive the IRT scale scores 
are presented in Table 12.4. 
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Figure 12.3: Confirmatory factor analysis of items measuring students’ use of ICT for activities 
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Model fit indices: Pooled sample Multiple-group models 

Configural Metric Scalar 
RMSEA 0.060 0.077 0.096 0.091 
CFI 0.98 0:97 0.93 0.89 
TLI 0.96 0.95 0.92 0.93 


Table 12.3: Reliabilities for scales measuring students’ participation in out-of-school activities 


Country Scale reliability (Cronbach's alpha) 
S_GENACT S_SPECACT 
Chile 0.68 0.70 
Denmark 0.70 0.74 
Finland 0.71 0.73 
France 0.65 0.71 
Germany 0.69 0.70 
taly 0.63 0.71 
azakhstan 0.75 0.78 
orea, Republic of 0.84 0.82 
Luxembourg 0.68 0.75 
Moscow (Russian Federation) 0.72 0.75 
North Rhine-Westphalia (Germany) 0.69 0.65 
Portugal 0.69 0.73 
United States 0.67 0.70 
Uruguay 0.71 0.70 
ICILS 2018 average 0.70 0.73 


Notes: Benchmarking participants in italics. T 
excluding benchmarking participants. 


eICILS 2018 average is based ond 


ata from the participating countries, 
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Table 12.4: Item parameters for scales measuring students’ use of ICT for activities 
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Scale or item Question/item wording Item parameters 
Delta Tau(1) Tau(2) Tau(3) Tau(4) 
S GENACT How often do you use ICT for each of the following activities? 
S2G19A Write or edit documents -0.42 -1.89 -0.57 0.34 212 
S2G19B Use a spreadsheet to do calculations, store data 0.33 -1.61 -0.46 0.00 2.08 
or plot graphs (e.g. using [Microsoft Excel ®]) 
$2G19C Create a simple “slideshow” presentation 0.10 -2.61 -0.71 0.67 2.65 
e.g. using [Microsoft PowerPoint ®]) 
SISPEGAGi How often do you use ICT for each of the following activities? 
$2G19D Record or edit videos -0.44 -0.78 -0.17 -0.09 1.03 
S2G19E Write computer programs, scripts or apps 0.29 -0.49 -0.22 -0.19 0.90 
e.g. using [Logo, LUA, or Scratch]) 
S2G19F Use drawing, painting or graphics software -0.18 -0.64 -0.05 -0.03 0.71 
or [apps] 
S2G19G Produce or edit music -0.14 0.34 -0.13 -0.22 0.01 
S2G19H Build or edit a webpage 0.46 0:16 -0.20 -0.26 0.30 


Students’ use of ICT for communication activities 


In Question 20, respondents were asked to indicate the frequency with which they use ICT for 
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Figure 12.4: Confirmatory factor analysis of items measuring students’ use of ICT for communication 
activities 
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Model fit indices: 


Pooled sample 


Multiple-group models 


Configural Metric Scalar 
RMSEA 0.090 0.107 0.109 0.108 
CFI 0.93 0.92 0.90 0.84 
TL 0.91 0.90 0.89 0.89 
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Table 12.5: Reliabilities for scales measuring students’ use of ICT for communication activities 
Country Scale reliability (Cronbach's alpha) 
S_ USECOM S_USEINF 
Chile 0.80 0.73 
Denmark 0.73 0.75 
Finland 0.74 0.73 
France 0.80 0.73 
Germany 0.73 0.76 
taly 0.77 0.74 
azakhstan 0.82 0.77 
orea, Republic of 0.81 0.80 
Luxembourg 0.76 0.80 
Moscow (Russian Federation) 0.76 0.75 
North Rhine-Westphalia (Germany) 0.72 0.73 
Portugal 0.76 0.76 
United States 0.82 0.76 
Uruguay 0.78 0.72 
ICILS 2018 average 0.77 O75 
Notes: Benchmarking participants in italics. The ICILS 2018 average is based on data from the participating countries, 


excluding benchmarking participants. 


Table 12.6: Item parameters for scales measuring students’ use of ICT for communication activities 


Scale or item Question/item wording Item parameters 
Delta Tau(1) Tau(2) Tau(3) Tau(4) 
S USECOM How often do you use ICT to do each of the following communication activities? 
S2G20A Share news about current events on social media 0.51 -0.14 0.10 -0.30 0.34 
S2G20B Communicate with friends, family, or other -0.78 0.06 0.34 -0.10 -0.30 
people using instant messaging, voice or video 
chat (e.g. [Skype, WhatsApp, Viber]) 
$2G20C Send texts or instant messages to friends, family, -0.71 0.15 0.38 -0.15 -0.38 
or other people 
S2G20D Write posts and updates about what happens in 0.79 -0.09 -0.13 -0.26 0.48 
your life on social media 
$2G20H Post images or video in social networks or online 0.44 -0.49 -0.21 0.04 0.67 
communities (e.g. [Facebook, Instagram or 
YouTube]) 
S$2G20I Watch videos or images that other people have -0.78 0.12 0.34 -0.12 -0.34 
posted online 
S$2G20) Send or forward information about events or 0.52 -0.37 -0.20 -0.09 0.66 
activities to other people 
S USEINF How often do you use ICT to do each of the following communication activities? 
S2G20E Ask questions on forums or [Q&A, question -0.11 -0.55 -0.49 0.08 0.97 
and answer] websites 
S2G20F Answer other peoples’ questions on forums or -0.10 -0.37 -0.43 0.03 0.77 
[Q&A, question and answer] websites 
S2G20G Write posts for your own blog 0.241 0.34 -0.49 -0.28 0.43 


(e.g. (WordPress, Tumblr, Blogger]) 
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Figure 12.5: Confirmatory factor analysis of items measuring students’ use of ICT for leisure activities 
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Model fit indices: 


Pooled sample 


Multiple-group models 


Configural Metric Scalar 
RMSEA 0.098 0.102 0.079 0.094 
CFI 0.97 0.97 0.97 0.90 
TLI 0.95 0.95 0.97 0.95 
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Table 12.7: Reliabilities for scale measuring students’ use of ICT for leisure activities 


Country Scale reliability (Cronbach's alpha) 
S ACCONT 
Chile 0.76 
Denmark 0.73 
Finland 0.78 
France 0.76 
Germany 0.72 
taly 0.73 
azakhstan 0.79 
orea, Republic of 0.83 
Luxembourg 0.74 
Moscow (Russian Federation) 0.68 
North Rhine-Westphalia (Germany) 0.73 
Portugal 0.74 
United States 0.79 
Uruguay 0.76 
ICILS 2018 average 0.75 


Notes: Benchmarking participants in italics. The ICILS 2018 average is based on data from the participating countries, 
excluding benchmarking participants. 


Table 12.8: Item parameters for scale measuring students’ use of ICT for leisure activities 


Scale or item Question/item wording Item parameters 
Delta Tau(1) Tau(2) Tau(3) Tau(4) 
SIAGEONT How often do you use ICT to do each of the following leisure activities? 
S2G21A Search the Internet to find information about 0.29 -1.26 -0.25 0.17 1.34 
places to go or activities to do 
$2G21B Read reviews on the Internet of things you 0.30 -0.82 -0.41 0.05 spel 
might want to buy 
S2G21C Read news stories on the Internet 0.00 -0.57 -0.09 -0.03 0.69 
$2G21D Search for online information about things you -0.60 -0.66 -0.31 -0.09 1.06 
are interested in 
S2G21E Use websites, forums, or online videos to find 0.01 -0.65 -0.46 -0.04 15 
out how to do something 
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In Question 22, students were asked to indicate the frequency that they use ICT for 10 items on 


na 


different school-related purposes. For each item, they were asked to select “never, “less than once 


nu 


a month,’ “at least once a month but not every week; 
day,’ or “every school day.’ All items in the question were used 
escores reflect more frequent use of 


ICT for study purposes 


results fro 


Overall, th 
compariso 
model tha 


e model sh 


t suggest t 


(Cronbach's alpha) o 


of 0.83, ra 


maconfirmatory fac 


S_USESTD),where higher sca 


owed only marginally satisfac 


ns show that there were differences in model fit betwe 


here was some variation in m 


nging between 0.77 and 0.89 across cou 


presented in Table 12.10. 


nt 


tor analysis for the model are presented in Figure 12.6. 


tory fit, and 


easuremen 


at least once a week but not every school 
to derive a scale of students' use of 


CT. The 


reviews of multiple-group model 
enthe metric and scalar invariance 
t characteristics. The reliabilities 


f the scale are presented in Table 12.9 and show a high average reliability 


ntries. The 


item parameters for the scale are 


Figure 12.6: Confirmatory factor analysis of items measuring students’ use of ICT for study purposes 
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Model fit indices: Pooled sample Multiple-group models 
Configural Metric Scalar 
RMSEA 0.072 0.086 0.097 0.108 
CFI 0.97 0.96 O93 0.86 
TLI 0.96 0.94 0.92 0.91 
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Table 12.9: Reliabilities for scale measuring students’ use of ICT for study purposes 


Country Scale reliability (Cronbach's alpha) 
S_USESTD 
Chile 0.82 
Denmark 0.77 
Finland 0.86 
France 0.81 
Germany 0.82 
taly 0.80 
azakhstan 0.87 
orea, Republic of 0.89 
Luxembourg 0.84 
Moscow (Russian Federation) 0.82 
North Rhine-Westphalia (Germany) 0.81 
Portugal 0.84 
United States 0.84 
Uruguay 0.84 
ICILS 2018 average 0.83 


Notes: Benchmarking participants in italics. The ICILS 2018 average is based on data from the participating countries, 
excluding benchmarking participants. 


Table 12.10: Item parameters for scale measuring students’ use of ICT for study purposes 


Scale or item Question/item wording Item parameters 
Delta Tau(1) Tau(2) Tau(3) Tau(4) 
SAUGESTID) How often do you use ICT for the following school-related purposes? 
S2G22A Prepare reports or essays -0.08 -1.38 -0.51 0.23 1.66 
$2G22B Prepare presentations -0.06 -1.87 -0.55 0.49 1.93 
$2G22C Work online with other students 0.08 -0.46 -0.26 -0.11 0.83 
$2G22D Complete [worksheets] or exercises -0.09 -0.60 -0.22 -O0.11 0.94 
$2G22E Organize your time and work -0.03 -0.10 -0.24 -0.13 0.47 
$2G22F Take tests 0.28 -0.71 -0.48 -0.07 1.27 
S2G22G Use software or applications to learn skills or 0.11 -0.54 -0.40 -0.08 1.03 
a subject (e.g. mathematics tutoring software, 
anguage learning software) 
$2G22H Use the Internet to do research -112 -0.92 -0.36 0.20 1.08 
S$2G22! Use coding software to complete assignments 0.57 -0.25 -0.46 -0.15 0.86 
e.g. [Scratch]) 
S$2G22) ake video or audio productions 0.33 -0.17 -0.05 -0.12 0.34 
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Question 24 required students to indicate how often they use different |CT-related tools during 


class (selecting from “never; 


Doe 


In some 


lesson’). Three of the items were used 


class (S_GENCLASS) and six of the item 


essons, 


nue 


in most lessons,” or “in every or almost every 
to derive a scale on students’ use of general applications in 
s were used to derive a scale of students’ use of specialist 


applications in class (S_SPECLASS). Higher scale scores reflect more frequent use of the respective 


type of ICT applications for class activiti 


Aconfirmatory factor analysis usi 


es, 


satisfactory fit for a two-dimensional model with a strong positive correlation 


latent 


models with different constraints, wh 


invariance. The national reliabiliti 
had acceptable reliabilities (Cron 


factors (0.66). The fit of the model was also acceptable when compari 
ich suggests a relatively high degree 
es for the two scales are presented in Table 12.11. Both scales 
bach’s alpha) with an average of 0.72 and 0.84 across countries 


ng all scaled items from this question (see Figure 12.7) revealed 


between the two 
ng multiple-group 
of measurement 


respectively (ranging from 0.53 to 0.81 for S GENCLASS and ranging from 0.78 to 0.92 for 


S_SPECLASS). The item parameters for both scales are displayed in Table 12.12. 


Figure 12.7: Confirmatory factor analysis of items measuring students’ report on the 


activities 


se of ICT for class 
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Model fit indices: 


Pooled sample 


Multiple-group models 


Configural Metric Scalar 
RMSEA 0.075 0.087 0.096 0.085 
CFI 0.97 0.97 0.96 0.96 
TLI 0.97 0.96 0.96 0.97 
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Table 12.11: Reliabilities for scales measuring students’ reports on the use of ICT for class activities 


Country Scale reliability (Cronbach’s alpha) 
S_GENCLASS S_SPECLASS 
Chile 0.76 0.84 
Denmark 0.53 0.82 
Finland 0.70 0.78 
France 0.73 0.78 
Germany 0.67 0.82 
taly 0.74 0.82 
azakhstan 0.76 0.87 
orea, Republic of 0.81 0.92 
Luxembourg 0.76 0.89 
Moscow (Russian Federation) 0.73 0.82 
North Rhine-Westphalia (Germany) 0.69 0.84 
Portugal 0.78 0.86 
United States 0.70 0.85 
Uruguay 0.72 0.86 
ICILS 2018 average OZ 0.84 
Notes: Benchmarking participants in italics. The |CILS 2018 average is based on data from the participating countries, 


excluding benchmarking participants. 


Table 12.12: Item parameters for scales measuring students’ reports on the use of ICT for class activities 


Scale or item Question/item wording Item parameters 
Delta Tau(1) Tau(2) Tau(3) 
S GENCLASS — When studying throughout this school year, how often did you use the following tools during class? 
S$2G24B Word-processing software (e.g. [Microsoft Word ®]) -0.14 -2.51 0.61 1.90 
S2G24C Presentation software (e.g. [Microsoft PowerPoint ®]) 0.03 -2.98 0.53 2.45 
S2G24 Computer-based information resources (e.g. websites, 0.12 -2.26 Osl7 2.08 


wikis, encyclopaedia) 


S SPECLASS When studying throughout this school year, how often did you use the following tools during class? 

S2G24E ultimedia production tools (e.g. media capture and -0.07 -1.86 0.39 1.47 
editing, web production) 

S2G24F Concept mapping software (e.g. [Inspiration ®], 0.37 -1.67 0.19 1.48 
Webspiration ®)]) 

S2G24G Tools that capture real-world data (e.g. speed, 0.16 -1.82 0.33 1.49 
temperature) digitally for analysis 

$2G24H Simulations and modelling software 0.35 -1.68 0.26 1.42 

$2G24) nteractive digital learning resources (e.g. learning -0.48 -2.06 0.45 1.61 
games or applications) 

S$2G24K Graphing or drawing software -0.33 -1.82 0.31 11 
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Students’ perceptions of school learning of ICT and coding tasks 


In Question 25, students were asked to indicate the extent 


different tasks at school that are related to usin 


g ICT. They 


options for each task (“to a large extent,’ “to a moderate exte 
learning of ICT tasks at school (S_ICT 


The items were used to derive ascale of students 
scale scores corresponded to higher perceived 


earning of d 


to which they had learn 
had to select between f 


ne 


ifferent tasks involving 


Question 29 required students to indicate the extent to which they had been taug 


t about eight 
our different 


nt? “to. asmall extent,’ or “not at all”). 


LRN). Higher 
CT, 


ht how to do 


tasks that are related to coding (selecting from the same response options as in Question 25). 


The nine items in the scale were used to derive 
school (S_CODLRN). 


ascale of st 


dents' learning of ICT coding tasks at 


Figure 12.8 shows the results from a confirmatory factor analysis of all scaled iterns measuring 


students’ perceptions of school learning of ICT and coding tasks. The model fit was satisfactory 
and amoderate correlation was observed between the two latent factors (0.51). When reviewing 
measurement invariance using multiple-group models with different constraints, the model fit 
changed only marginally which indicates a relatively high degree of invariance for this model. 


Figure 12.8: Confirmatory factor analysis of items measuring students’ perceptions of school learning of ICT 


and coding tasks 
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Model fit indices: Pooled sample Multiple-group models 

Configural Metric Scalar 
RMSEA 0.063 0.071 N/A 0.078 
CFI 0.97 0.97 N/A 0.94 
TL 0.96 0.96 N/A 0.95 
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Table 12.13 shows the scale reliabilities (Cronbach's alpha) for the two scales reflecting the extent 
of learning of ICT and coding tasks. The reliabilities were satisfactory for all countries. The item 
parameters for both scales that were used to derive the IRT scale scores are presented in Table 
12.14.S_ICTLRN had a high level of reliability (Cronbach's alpha) with an average reliability of 0.88 


across countries (ranging from 0.83 


to 0.94), and S CODLRN also had a high average reliability 


(Cronbach's alpha = 0.90) across countries (ranging from 0.85 to 0.96). 


Table 12.13: Reliabilities for scales measuring students’ perceptions of school learning of ICT and coding tasks 


Country Scale reliability (Cronbach's alpha) 
S_ICTLRN S_CODLRN 
Chile 0.86 0.90 
Denmark 0.83 0.85 
Finland 0.92 0.91 
France 0.84 0.88 
Germany 0.86 0.89 
taly 0.87 0.88 
azakhstan 0.89 0.90 
orea, Republic of 0.94 0.96 
Luxembourg 0.88 0.91 
Moscow (Russian Federation) 0.94 0.91 
North Rhine-Westphalia (Germany) 0.87 0.89 
Portugal 0.91 0.92 
United States 0.89 0.90 
Uruguay 0.87 0.90 
ICILS 2018 average 0.88 0.90 


Notes: Benchmarking participants in italics. T 
excluding benchmarking participants. 


eICILS 2018 average is based on data from the participating countries, 
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Table 12.14: Item parameters for scales reports on the use of ICT for class activities 


Scale or item Question/item wording Item parameters 
Delta Tau(1) Tau(2) Tau(3) 
S ICTLRN At school, to what extent have you learned how to do the following tasks? 
S2G25A Provide references to Internet sources -0.04 =. 57 -0.32 1.88 
S2G25B Search for information using ICT -0.38 -1.26 -0.34 1.60 
S$2G25C Present information for a given audience or purpose 0.10 -1.35 -0.38 1.73 
using ICT 
$2G25D Work out whether to trust information from the Internet 0.08 -1.36 -0.30 1.66 
$2G25E Decide what information obtained from the Internet is -0.03 -1.38 -0.37 1.75 
relevant to include in school work 
S2G25F Organize information obtained from Internet sources -0.04 -1.50 -0.33 1.82 
S2G25G Decide where to look for information on the Internet -0.01 -1.32 -0.32 1.64 
about an unfamiliar topic 
S$2G25H Use ICT to collaborate with others 0.31 -1.12 -0.27 1.39 
S CODLRN When studying during the current school year, to what extent have you been taught how to do the following tasks? 
S2G29A To display information in different ways -0.89 -1.59 -O.49 2.08 
S2G29B To break a complex process into smaller parts 0.20 -1.56 -0.51 2.06 
$2G29C To understand diagrams that describe or show real- -0.30 -1.64 -0.29 1.93 
world problems 
$2G29D To plan tasks by setting out the steps needed to -0.27 -1.57 -0.35 1.92 
complete them 
S2G29E To use tools to make diagrams that help solve problems 0.01 -1.52 -0.30 1.82 
S2G29F To use simulations to help understand or solve real 0.57 -1.42 -0.39 1.84 
world problems 
$2G29G To make flow diagrams to show the different parts of 0.68 -1.30 -0.32 1.62 
a process 
$2G29H To record and evaluate data to understand and solve 0.01 -1.60 -0.31 1.91 
a problem 
$2G29| To use real-world data to review and revise solutions 0.00 -1.45 -0.35 1.80 


to problems 


Students’ self-efficacy 


Question 27 of the ICILS 2018 student questionnaire asked respondents to indicate how well 
they could do aseries of tasks when using ICT (response categories were “| know how to do this,’ 
“| have never done this but | could work out how to do this,’ and “| do not think | could do this’). 


Twelve of the thirteen items from this question provided 


data for deriving two scales: students’ 


self-efficacy regarding the use of general applications (S_GENEFF) and students' self-efficacy regarding 


the use of specialist applications (S_SPECEFF). 


Figure 12.9 i 
between the two latent factors (0.49). W 
group mode 
satisfactory with more constrained mode 
S_GENEFF was derived 
with the coeff 
S_SPECEFF was based on four items and 
ranging from 0.68 to 0.79. The higher va 
self-efficacy. The item parameters for the 


lustrates the results of the confirmatory factor analysis assuming a two-d 
model with items from the scale. The model fi 


from eight items and 
ficients ranging from 0.76 to 0.92 across participati 


t was very good, and there was amoderate 


s suggesting a high degree of measurement 


imensional 
correlation 


hen reviewing measurement invariance using multiple- 
s with different constraints, the results showed that model fit remained equally 


invariance. 


had an average reliability (Cronbach's alp 


ues on these two scales represent a greate 
two scales used for scali 


ha) of 0.83, 


ng countries (see Table 12.15). 
had an average scale reliability of 0.74, with coefficients 


r degree of 


ng are recorded in Table 12.16. 
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Figure 12.9: Confirmatory factor analysis of items measuring students’ ICT self-efficacy 
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Model fit indices: 


Pooled sample 


Multiple-group models 


Configural Metric Scalar 
RMSEA 0.075 0.082 0.079 0.082 
CFI 0.94 0.94 0.935 0.92 
TL 0.92 0.93 0.931 0.93 
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Table 12.15: Reliabilities for scales measuring students’ ICT self-efficacy 


Country Scale reliability (Cronbach’s alpha) 
S_GENEFF S_SPECEFF 
Chile 0.81 0.73 
Denmark 0.76 0.75 
Finland 0.82 0.73 
France 0.80 0.68 
Germany 0.83 0.75 
taly 0.77 0.72 
azakhstan 0.92 0.74 
orea, Republic of 0.89 0.79 
Luxembourg 0.85 0.74 
Moscow (Russian Federation) 0.89 0.71 
North Rhine-Westphalia (Germany) 0.81 0.72 
Portugal 0.81 0.73 
United States 0.84 0.76 
Uruguay 0.83 0.73 
ICILS 2018 average 0.83 0.73 
Notes: Benchmarking participants in italics. The ICILS 2018 average is based on data from the participating countries, 


excluding benchmarking participants. 


Table 12.16: Item parameters for scales measuring students’ ICT self-efficacy 


Scale or item Question/item wording Item parameters 
Delta Tau(1) Tau(2) 
S GENEFF How well can you do each of these tasks when using ICT? 
S$2G27A Edit digital photographs or other graphic images 0.17 -0.57 0.57 
$2G27C Write or edit text for a school assignment -0.28 -0.16 0.16 
$2G27D Search for and find relevant information for a school -0.41 -0.16 0.16 
project on the Interne 
S2G27 Create a multi-media presentation (with sound, 0.65 -0.78 0.78 
pictures, or video) 
S2G275 Upload text, images, or video to an online profile 0.06 -0.44 0.44 
S2G27 nsert an image into adocument or message -0.29 -0.12 0.12 
$2G27L nstall a program or [app] -0.20 -0.07 0.07 
S2G27 Judge whether you can trust information you find on 0.30 -0.71 0.71 
the Internet 
S SPECERE How well can you do each of these tasks when using ICT? 
$2G27B Create a database (e.g. using [Microsoft Access ®]) -0.06 -1.15 1.15 
S2G27E Build or edit a webpage -0.28 -1.17 1.17 
$2G27G Create a computer program, macro, or [app] 0.62 -1.04 1.04 
e.g. in [Basic, Visual Basic]) 
S2G27H Set up a local area network of computers or other ICT -0.28 -0.69 0.69 
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Students’ perceptions of ICT 


In Question 28 of the student questionnaire, respondents were asked provide their level of 


ne me ne 


agreement (“strongly agree,’ “agree,” “disagree, “strongly disagree’) to aseries of statements about 
ICT. The following scales were derived from the 114 items in the question: 


e St 
e St 
e St 


dents' perceptions of positive outcomes of ICT for society (S_ICTPOS) 
dents' perceptions of negative outcomes of ICT for society (S_ICTNEG) 
se for work and study (S_ICTFUT) 


dents' expectations of future [CT 


Higher scores on these scales corresponded to stronger views (whether they be positive views, 
negative views, or greater expectations). A confirmatory factor analysis of the items in the question 


was run (see Figure 12.10), and supported a three-dimensional model. The model had a good fit, 
even when constraints were placed, which broadly suggests a relatively high level of measurement 
invariance. There was a strong positive correlation between S_ICTPOS and S_ICTFUT latent 
factors, but S_ICTNEG had only weak correlations with the other two scales. 
Figure 12.10: Confirmatory factor analysis of items measuring students’ perceptions of ICT 
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Model fit indices: Pooled sample Multiple-group models 
Configural Metric Scalar 
RMSEA 0.042 0.063 0.064 0.075 
CFI 0.98 0.97 0.96 0.92 
TLI 0.98 0.96 0.95 0.94 
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ronbach’s alpha) across countries for S_ ICTPOS was 0.75 (ranging from 


0.76 to 0.85), for S_ICTNE 


EG it was 0.66 (ranging from 0.62 to 0.72), andfor S_ICTFUT it was 0.80 


(ranging from 0.74 to 0.85) (see Table 12.17). Item parameters for the three scales are presented 


in Table 12.18. 


Table 12.17: Reliabilities for scales measuring students’ perceptions of ICT 


Country Scale reliability (Cronbach's alpha) 
S_ICTPOS S_ICTNEG SaleneWig 
Chile 0.75 0.65 0.80 
Denmark 0.67 0.63 0.84 
Finland 0.81 0.67 0.83 
France 0.73 0.68 0.82 
Germany 0.72 0.65 0.82 
taly 0.73 0.63 0.74 
azakhstan 0.79 0.68 0.79 
orea, Republic of 0.85 0.72 0.84 

Luxembourg 0.73 0.64 0.80 
Moscow (Russian Federation) 0.78 0.68 0.80 
North Rhine-Westphalia (Germany) 0.70 0.62 0.85 
Portugal 0.74 0.66 0.75 
United States 0.75 0.67 0.81 
Uruguay 0.77 0.68 0.75 
ICILS 2018 average 0.75 0.66 0.81 

Notes: Benchmarking participants in italics. The ICILS 2018 average is based on data from the participating countries, 

excluding benchmarking participants. 

Table 12.18: Item parameters for scales measuring students’ perceptions of |CT 


Scale or item Question/item wording Item parameters 
Delta Tau(1) Tau(2) Tau(3) 
Sn Gie@s How much do you agree or disagree with the following statements about ICT? 
S2G28A Advances in ICT usually improve people's living 0.01 -1.95 -0.79 2.75 
conditions. 
S2G28B CT helps us to understand the world better. -O0.11 -2.03 -0.70 273 
S2G28F CT is valuable to society. 0.02 -1.98 -0.64 2.62 
$2G28G Advances in ICT bring many social benefits. 0.08 -2.06 -0.58 2.64 
S ICTNEG How much do you agree or disagree with the following statements about ICT? 
S$2G28C Using ICT makes people more isolated in society. 0.00 -1.65 0.03 1.62 
$2G28D With more ICT there will be fewer jobs. 0.47 -1.62 0.22 L.40 
S2G28E People spend far too much time using ICT. -0.48 -1.18 -0.27 1.45 
S$2G28H Using ICT may be dangerous for people's health. 0.02 -1.33 -0.27 59 
SCHACHT How much do you agree or disagree with the following statements about ICT? 
$2G28 would like to study subjects related to ICT after 0.34 -1.90 -0.07 97 
secondary school] 
$2628) hope to find a job that involves advanced ICT 0.24 -1.96 -0.06 2.02 
S2G28 Learning how to use ICT applications will help me to -0.57 -1.79 -0.40 2.19 
do the work | am interested in 
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Teacher questionnaire 

Teachers’ ICT self-efficacy 

Question 7 of the ICILS 2018 teacher questionnaire asked respondents to indicate how well they 
could do nine different school-related tasks using ICT. They were asked to select “I know how to do 
this” “| haven't done this but | could find out how,’ or “I do not think | could do this.’ All items were 
used to derive ascale on teachers’ ICT self-efficacy (T_ICTEFF). Higher scale scores corresponded 
to a greater degree of teacher self-efficacy in using ICT for these tasks. 


Aconfirmatory factor analysis (see Figure 12.11) reveals the model was highly satisfactory. When 
comparing the configural and scalar multiple-group model,!* the model is somewhat less satisfactory 
suggesting some variation in measurement properties for this model. The items in the scale were 
shown to be highly reliable. The average reliability (Cronbach's alpha) across countries was 0.81 
(ranging from 0.73 to 0.82) (see Table 12.19). The item parameters for the scale are presented 
in Table 12.20. 


Figure 12.11: Confirmatory factor analysis of items measuring teachers’ ICT self-efficacy 
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Model fit indices: Pooled sample Multiple-group models 
Configural Metric Scalar 
RMSEA 0.044 0.041 N/A 0.071 
CFI 0.98 0.99 N/A 0.94 
Tu 0.97 0.98 N/A 0.95 


11 Please note that there was no convergence for the metric model. 
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Table 12.19: Reliabilities for scale measuring teachers’ ICT self-efficacy 


Country Scale reliability (Cronbach's alpha) 
T_ICTEFF 
Chile 0.74 
Denmark 0.73 
Finland 0.79 
France 0.80 
Germany 0.80 
taly 0.82 
azakhstan 0.90 
orea, Republic of 0.84 
Luxembourg 0.78 
Moscow (Russian Federation) 0.77 
North Rhine-Westphalia (Germany) 0.80 
Portugal 0.79 
United States 0.88 
Uruguay 0.82 
ICILS 2018 average 0.80 
Notes: Benchmarking participants in italics. The ICILS 2018 average is based on data from the participating countries, 


excluding benchmarking participants. 


Table 12.20: Item parameters for scale measuring teachers’ ICT self-efficacy 


Scale or item Question/item wording Item parameters 
Delta Tau(1) Tau(2) 
ILICHERE How well can you do these tasks using ICT? 
T2GO7A Find useful teaching resources on the Internet -1.18 0.70 -0.70 
T2G07B Contribute to a discussion forum/user group on the 0.58 -1.48 1.48 
nternet (eg. a wiki or blog) 
T2GO7C Produce presentations (e.g. [Power Point® or a similar -0.15 -0.22 0.22 
program]), with simple animation functions 
T2G07D Use the Internet for online purchases and payments -0.46 -0.36 0.36 
T2GO7E Prepare lessons that involve the use of ICT by students -0.52 -0.68 0.68 
T2GO7F Using a spreadsheet program (e.g. [Microsoft Excel ®]) 0.51 -0.79 0.79 
for keeping records or analyzing data 
T2G07G Assess student learning -0.27 -0.94 0.94 
T2GO07H Collaborate with others using shared resources 0.80 -1.24 1.24 
such as [Google Docs®], [Padlet] 
T2G07 Use a learning management system (e.g. [Moodle], 0.69 -1.27 1.27 
[Blackboard], [Edmodo]) 
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Teachers’ emphasis on developing ICT skills and coding skills 
Teachers were asked in Question 9 to indicate the emphasis they had given to developing different 
|CT-based capabilities to students in the reference class (selecting from “strong emphasis,’ “some 
emphasis,’ “little emphasis,’ or “no emphasis’). The nine items in the question were used to derive 
ascale of teachers’ emphasis on developing ICT capabilities in class (T_ICTEMP). 


Similarly, Question 13 required teachers to indicate the emphasis they had given to teaching 
different skills related to coding in the reference class (using the same response options as in 
Question 9). The nine items in this question were used to derive a scale of teacher emphasis of 
teaching CT-related tasks (T_CODEMP). Higher scores on either scale corresponds to greater 
emphasis on developing |CT-based capabilities and coding skills. 


Figure 12.12: Confirmatory factor analysis of items measuring teachers’ emphasis on learning of [CT and 
coding tasks 
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Model fit indices: Pooled sample Multiple-group models 


Configural Metric Scalar 
RMSEA 0.066 0.063 N/A 0.075 
CFI 0.96 0.96 N/A 0.93 
TL 0.95 0.96 N/A 0.94 
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Figure 12.12 illustrates the results of the confirmatory factor analysis assuming a two-dimensional 
model with items from the two scales. The model fit was highly satisfactory. When reviewing 
measurement invariance using multiple-group models with different constraints, the model fit 
changed only marginally which indicates a relatively high degree of invariance for this model. There 
was ahigh correlation between the two latent factors in the model (0.60). Both scales were highly 
reliable (see Table 12.21). The average reliability (Cronbach's alpha) across countries for both was 
0.90 (ranging from 0.86 to 0.94 for T_ICTEMP, and between 0.88 and 0.93 for T CODEMP). The 
item parameters for both scales are presented in Table 12.22 


Table 12.21: Reliabilities for scales measuring teachers’ emphasis on learning of ICT and coding tasks 


Country Scale reliability (Cronbach’s alpha) 
T_ICTEMP T_CODEMP 
Chile 0.90 0.92 
Denmark 0.87 0.88 
Finland 0.91 0.88 
France 0.91 0.89 
Germany 0.90 0.89 
taly 0.90 0.90 
azakhstan 0.89 0.91 
orea, Republic of 0.92 0.93 
Luxembourg 0.93 0.92 
Moscow (Russian Federation) 0.86 0.91 
North Rhine-Westphalia (Germany) 0.90 0.89 
Portugal 0.92 0.91 
United States 0.94 0.92 
Uruguay 0.88 0.91 
ICILS 2018 average 0.90 0.90 
Notes: Benchmarking participants in italics. The ICILS 2018 average is based on data from the participating countries, 


excluding benchmarking participants. 
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Table 12.22: Item parameters for scales measuring teachers’ emphasis on learning of ICT and coding tasks 
Scale or item Question/item wording Item parameters 
Delta Tau(1) Tau(2) Tau(3) 
T_ICTEMP In your teaching the reference class in this school year, how much emphasis have you given to developing the 
following ICT-based capabilities in your students? 
T2GO9A To access information efficiently -0.84 -2.12 -0.49 2.61 
T2G09B To display information for a given audience/purpose -0.37 -2.00 -0.47 2.46 
T2GO09C To evaluate the credibility of digital information -0.20 -1.95 -0.25 2:20 
T2G09D To share digital information with others 0.09 -2.02 -0.33 239 
T2GO9E To use computer software to construct digital work -0.29 -1.61 -0.30 1.92 
products (e.g. presentations, documents, images and 
diagrams) 
T2GO9F To provide digital feedback on the work of others 1:23 -1.87 -0.22 2.09 
(such as classmates) 
T2G09G To explore a range of digital resources when searching -0.19 -1.87 -0.35 222 
for informatio 
T2GO9H To provide references for digital information sources 0.31 -1.70 -0.34 2.04 
T2G09| To understand the consequences of making information 0.26 -1.57 -0.19 1.76 
publically available online 
T_CODEMP In your teaching of the reference class this school year, how much emphasis have you given to teaching the 
following skills? 
T2G13A To display information in different ways -1.26 -1.82 -0.70 252 
T2G13B To break a complex process into smaller parts -0.74 -1.70 -0.61 2.30 
T2G13C To understand diagrams that describe or show real- -0.06 -1.48 -0.51 2.00 
world problems 
T2G13D To plan tasks by setting out the steps needed to -0.75 -1.72 -0.48 2.20 
complete them 
[2G13E To use tools making diagrams that help solve problems 0.71 -1.48 -0.41 1.89 
T2G13F To use simulations to help understand or solve real- 0.72 -1.4 -0.43 1.84 
world problems 
T2G13G To make flow diagrams to show the different parts of 1.31 -1.38 -0.39 1.77 
a process 
2G13H To record and evaluate data to understand and solve 0.11 -1.46 -0.54 1.99 
a problem 
T2613) To use real-world data to review and revise solutions -0.04 -1.43 -0.54 1.97 
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Teachers’ use of ICT for class activities 


Question 10 of the ICILS 2018 teacher questionnaire asked t 
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eachers to select how often students 


in their reference class used ICT for different activities (they could choose from “they do not 


Dag ne 


engage in this activity,’ “they never use ICT in this activity,’ “th 
“they often use ICT in this activity,’ or “they always use ICT 


ey sometimes use ICT in this activity,’ 
in this activity”). The 14 items in the 


question were used to derive a scale of teachers' use of ICT for classroom activities (T_CLASACT), 


where responses from teachers who indicated that their s 


tudents did not engage in an activity 


were treated as missing values. Higher scale scores corresponded to more frequent use of ICT 


for these activities. 


Figure 12.13 illustrates the results of the confirmatory factor 


alpha) of the scale was 0.94 (ranging from 0.92 to 0.96) (see 


Figure 12.13: Confirmatory factor analysis of items measuring teac 


item parameters for the scale that was used to derive the IRT 


analysis assuming a one-dimensional 


model with items from the scale. The analysis showed only marginally satisfactory model fit once 
residual variance for two items with similar content was taken into account (for items aand b). The 
model was equally marginally satisfactory for multiple-group models with different constraints, 
which suggests a relative high level of measurement invariance. The average reliability (Cronbach's 


Table 12.23). Table 12.24 shows the 
scale scores. 
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Model fit indices: Pooled sample Multiple-group models 
Configural Metric Scalar 
RMSEA 0.091 0.097 0.095 0.095 
CFI 0.96 0.96 0.95 0.94 
TLI 0.96 0.95 0.95 0.95 
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Table 12.23: Reliabilities for scale measuring teachers’ use of ICT for class activities 
Country Scale reliability (Cronbach’s alpha) 
T_CLASACT 
Chile 0.94 
Denmark 0.92 
Finland 0.92 
France 0.94 
Germany 0.94 
taly 0.92 
azakhstan 0.94 
orea, Republic of 0.95 
Luxembourg 0.94 
Moscow (Russian Federation) 0.92 
North Rhine-Westphalia (Germany) 0.92 
Portugal 0.96 
United States 0.95 
Uruguay 0.95 
ICILS 2018 average OE 
Notes: Benchmarking participants in italics. The ICILS 2018 average is based on data from the participating countries, 
excluding benchmarking participants. 
Table 12.24: Item parameters for scale measuring teachers’ use of ICT for class activities 
Scale or item Question/item wording Item parameters 
Delta Tau(1) Tau(2) Tau(3) 
TEGEASAGII How often do students in your reference class use ICT for the following activities? 
T2G10A Work on extended projects (i.e. lasting over a week) -0.73 -2.73 0.86 1.87 
T2G10B Work on short assignments (i.e. within one week) -0.67 -2.91 0.56 2.35 
T2G10C Explain and discuss ideas with other students 0.73 -2.54 0.31 2.23 
T2G10D Submit completed work for assessment -0.32 -2.28 0.49 1.79 
T2G10E Work individually on learning materials at their own -0.07 -2.67 0.42 225 
pace 
T2G10F Undertake open-ended investigations or field work 0.03 -2.69 0.59 2.10 
T2G10G Reflect on their learning experiences (e.g. by using 0.99 -1.92 0.36 57 
a learning log) 
T2G10H Communicate with students in other schools on projects 0.63 -2.19 0.27 92 
T2G10 Plan a sequence of learning activities for themselves 0.97 -2.02 0.32 70 
T2G10J Analyze data 0.22 -2.48 0.54 94 
T2G10 Evaluate information resulting from a search 0.11 -249 0.53 97 
T2G10L Collect data for a project -0.90 -2.70 0.67 2.02 
T2G10 Create visual products or videos -0.76 -2.53 0.74 1.79 
T2G10 Share products with other students -0.22 -2.32 0.50 1.82 
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Teachers’ use of ICT for teaching practices 

In Question 11, respondents were asked to indicate how often they use ICT for 10 different 
practices related to teaching of the reference class (response options were “| do not use this 
practice with the reference class,” “| never use ICT with this practice” “| sometimes use ICT with 
this practice. “| often use ICT with this practice.’ and “I always use ICT with this practice’). Eight 
of the 10 items were used to derive the scale teachers' use of ICT for teaching practices in class 
(T_ICTPRAC) where responses from teachers who indicated that they did not use this practice 
with the reference class were treated as missing values. Higher scale scores corresponded to more 


frequent use of ICT with the different practices. 


Figure 12.14 illustrates the results of the confirmatory factor analysis assuming a one-dimensional 
model with items from the scale. The model fit was satisfactory for the pooled data set. When 
reviewing measurement invariance using multiple-group models with different constraints, the 
model fit changed only marginally which indicates a relatively high degree of invariance for this 
model. The average reliability (Cronbach's alpha) of the scale across countries was high (0.90), 
ranging from 0.86 to 0.93 (see Table 12.25). The item parameters used to derive the IRT scale 
scores are presented in Table 12.26. 


Figure 12.14: Confirmatory factor analysis of items measuring teachers’ use of ICT for teaching practices 
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Model fit indices: Pooled sample Multiple-group models 

Configural Metric Scalar 
RMSEA 0.063 0.074 0.076 0.080 
CFI 0.99 0.99 0.98 0.97 
TLI 0.99 0.98 0.98 0.98 
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Table 12.25: Reliabilities for scale measuring teachers’ use of ICT for teaching practices 
Country Scale reliability (Cronbach's alpha) 
T_ICTPRAC 
Chile 0.92 
Denmark 0.88 
Finland 0.86 
France 0.89 
Germany 0.88 
taly 0.87 
azakhstan 0.92 
orea, Republic of 0.92 
Luxembourg 0.89 
Moscow (Russian Federation) 0.90 
North Rhine-Westphalia (Germany) 0.86 
Portugal 0.91 
United States 0.91 
Uruguay 0.93 
ICILS 2018 average 0.89 
Notes: Benchmarking participants in italics. The ICILS 2018 average is based on data from the participating countries, 
excluding benchmarking participants. 
Table 12.26: Item parameters for scale measuring teachers’ use of ICT for teaching practices 
Scale or item Question/item wording Item parameters 
Delta Tau(1) Tau(2) Tau(3) 
T_ICTPRAC How often do you use ICT in the following practices when teaching your reference class? 
T2G11B The provision of remedial or enrichment support to -0.20 -2.41 0.26 2.15 
individual students or small groups of students 
T2G11C The support of student-led whole-class discussions -0.34 -2.34 0.21 2.12 
and presentations 
T2G11D The assessment of students’ learning through tests -0.10 -1.72 0.20 1.52 
T2G11E The provision of feedback to students on their work 0.21 -2.07 0.27 1.80 
T2G11F The reinforcement of learning of skills through -0.28 -2.37 0.25 2:12 
repetition of examples 
T2G11G The support of collaboration among students 0.25 -2.28 0.23 2.05 
T2G11H The mediation of communication between students 0.77 -1.78 0.05 1.73 
and experts or external mentors 
T2G11J [he support of inquiry learning -0.31 -2.35 0.37 1.98 
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eICILS 2018 teacher questionnaire asked teachers how often they used different 
inthe teaching of their reference classes. For each tool, respondents were asked 
in most lessons,’ or “in every or almost every lesson.” 
derived from the set of items: teachers’ use of digital learning tools (T_USETOOL) 
and teachers' use of general utility software (T_USEUTIL). Higher scores on either scale reflects more 
hese types of tools. 


ysis assuming a two-dimensional 
Is the fit became unsatisfactory, a 
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0.65 to 0.82 for the latter) (see Table 12.27). 
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Model fit indices: 


Pooled sample 


Multiple-group models 


Configural Metric Scalar 
RMSEA 0.063 0.087 0.109 0.105 
CFI 0.96 0.94 0.88 0.87 
TLI 0.95 0.92 0.88 0.89 
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Table 12.27: Reliabilities for scales measuring teachers’ use of ICT tools in class 
Country Scale reliability (Cronbach's alpha) 
T_USETOOL T_USEUTIL 
Chile 0.87 0.81 
Denmark 0.71 0.66 
Finland 0.75 0.68 
France 0.79 0.65 
Germany 0.83 0.71 
taly 0.79 0.78 
azakhstan 0.91 0.80 
orea, Republic of 0.90 0.78 
Luxembourg 0.78 0.70 
Moscow (Russian Federation) 0.89 0.82 
North Rhine-Westphalia (Germany) 0.79 0.74 
Portugal 0.80 0.76 
United States 0.86 0.72 
Uruguay 0.86 0.78 
ICILS 2018 average 0.82 0.74 
Notes: Benchmarking participants in italics. The |CILS 2018 average is based on data from the participating countries, 
excluding benchmarking participants. 
Table 12.28: Item parameters for scales measuring teachers’ use of ICT use in class 
Scale or item Question/item wording Item parameters 
Delta Tau(1) Tau(2) Tau(3) 
InUSETOOE How often did you use the following tools in your teaching of the reference class this school year? 
T2G12A Practice programs or apps where you ask students 0.02 -1.87 0.61 1.27 
questions (e.g. [Quizlet, Kahoot], [mathfessor]) 
T2G12B Digital learning games -0.16 -2.08 0.69 1.38 
T2G12G Concept mapping software (e.g. [Inspiration ®], 0.54 -1.18 0.26 0.93 
Webspiration ®)]) 
T2G12H Simulations and modelling software (e.g. [NetLogo]) 0.88 -0.72 -0.02 0.74 
T2G12 A learning management system (e.g. [Edmodo], -1.03 -0.43 0.24 0.20 
Blackboard]) 
T2G12 Collaborative software (e.g. [Google Docs ®], -0.37 -1.25 0.29 0.96 
Onenote]) [Padlet]) 
T2G12 nteractive digital learning resources -0.64 -1.35 0.24 kd: 
(e.g. learning objects) 
T2612 Graphing or drawing software 0.26 “1.15 0.18 O:97 
T2G120 e-portfolios (e.g. [VoiceThread]) 0.58 -0.57 -0.09 0.66 
T2G12Q Social media (e.g. [Facebook, Twitter]) -0.08 -1.02 0.34 0.68 
T_USEUTIL How often did you use the following tools in your teaching of the reference class this school year? 
T2G12C Word-processor software (e.g. [Microsoft Word ®]) -0.24 -1.93 0.52 1.41 
2G12D Presentation software (e.g. [Microsoft PowerPoint ®]) -0.25 -2.05 0.55 1.50 
T2G12L Computer-based information resources (e.g. topic- 0.10 -2.25 O53 1.72 
related websites, wikis, encyclopaedia) 
T2G12P Digital contents linked with textbooks 0.39 -1.37 0.39 0.98 


200 


ICILS 2018 TECHNICAL REPORT 


Teachers’ perceptions of ICT resources and teacher collaboration 


Question 14 of the ICILS 2018 teacher questionnaire asked teachers to provide their level of 


agreement or disagreement (“strongly agree,’ “agree,” “disagree: 


nu Dar 


nu 


strongly disagree”) to a series 


of statements about ICT resources at their schools. Question 15 requested respondents to rate 
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Figure 12.16: Confirmatory factor analysis of items measuring teachers’ perceptions of ICT resources and 
teacher collaboration at school 


qi4b X‘ 
ae 51 
76 ee qi4c a 
78 se a 
87 
T_RESRC 78 > qi4e 
i 4f 
80 4 
79 aa qi4g 
A2 qi4h 
eee qi5a 
90 
92 et ee 
TIGOMKET 75 > qi5c 
83 ce eee 
76 ened 
FPR Ee: ae 
Model fit indices: Pooled sample Multiple-group models 
Configural Metric Scalar 
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Table 12.29: Reliabilities for scales measuring teachers’ perceptions of |CT resources and teacher collaboration 
at school 
Country Scale reliability (Cronbach’s alpha) 
T_RESRC mCOhieh 
Chile 0.89 0.90 
Denmark 0.84 0.83 
Finland 0.83 0.86 
France 0.85 0.86 
Germany 0.89 0.77 
taly 0.87 0.89 
azakhstan 0.93 0.85 
orea, Republic of 0.90 0.89 
Luxembourg 0.84 0.85 
Moscow (Russian Federation) 0.88 0.88 
North Rhine-Westphalia (Germany) 0.87 0.81 
Portugal 0.86 0.86 
United States 0.90 0.92 
Uruguay 0.88 0.89 
ICILS 2018 average 0.87 0.85 
Notes: Benchmarking participants in italics. The |CILS 2018 average is based on data from the participating countries, 
excluding benchmarking participants. 
Table 12.30: Item parameters for scales measuring teachers’ perceptions of ICT resources and teacher 
collaboration at school 
Scale or item Question/item wording Item parameters 
Delta Tau(1) Tau(2) Tau(3) 
TLIRESRE To what extent do you agree or disagree with the following statements about using ICT in teaching at your school? 
T2G14B y school has sufficient ICT equipment (e.g. computers). -0.42 -2.15 -0.12 2.26 
T2G14C The computer equipment in our school is up-to-date. -0.30 -2.29 -0.22 2541 
T2G14D y school has access to sufficient digital learning -0.21 -2.50 -0.18 2.68 
resources (e.g. learning software or [apps]). 
T2G14E y school has good connectivity (e.g. fast spped - -0.06 -2.12 -0.30 2.42 
same as in STable) to the Internet. 
T2G14F There is enough time to prepare lessons that 0.69 -2.72 -0.05 2.77 
incorporate ICT. 
T2G14G There is sufficient opportunity for me to develop 0.241 -2.82 -0.23 3.05 
expertise in ICT. 
T2G14H There is sufficient technical support to maintain ICT 0.09 -2.44 -0.31 2.76 
resources. 
(EOL Grp To what extent do you agree or disagree with the following statements about your use of ICT in teaching and learning 
at your school? 
T2G15A work together with other teachers on improving the 0.18 -3.62 -0.35 3.97 
use of ICT in classroom teaching. 
T2G15B collaborate with colleagues to develop ICT-based 0.34 -3.87 -0.23 4.10 
essons. 
T2G15C observe how other teachers use ICT in teaching. 0.01 -3.57 -0.69 4.26 
T2G15D discuss with other teachers how to use ICT in. -0.24 -3.67 -0.80 4.47 
teaching topics 
T2G15E share |CT-based resources with other teachers in -0.29 -3.40 -0.62 4.03 
my school. 
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Teachers’ reports on ICT-related professional learning 


Question 17 of the ICILS 2018 teacher questionnaire asked teachers to indicate their participation 
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Figure 12.17: Confirmatory factor analysis of items measuring teachers’ participation in |CT-related 
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Model fit indices: Pooled sample Multiple-group models 

Configural Metric Scalar 
RMSEA 0.074 0.076 0.095 0.096 
CFI 0.96 0.97 0.94 0.92 
TL 0.95 0.96 0.93 0.93 
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Table 12.27: Reliabilities for scales measuring teachers’ participation in |CT-related professional learning 


Country Scale reliability (Cronbach’s alpha) 
T_PROFSTR T_PROFREC 
Chile 0.84 0.74 
Denmark 0.75 0.64 
Finland 0.63 0.67 
France 0.72 0.69 
Germany 0.72 0.62 
taly 0.79 0.68 
azakhstan 0.87 0.79 
orea, Republic of 0.83 0.78 
Luxembourg 0.73 0.73 
Moscow (Russian Federation) 0.80 0.76 
North Rhine-Westphalia (Germany) 0.64 0.58 
Portugal 0.75 0.65 
United States 0.82 O77 
Uruguay 0.80 0.74 
ICILS 2018 average 0.76 0.69 


Notes: Benchmarking participants in italics. T 
excluding benchmarking participants. 


eICILS 2018 average is based on data from the participating countries, 


Table 12.28: Item parameters for scales measuring teachers’ participation in |CT-related professional learning 
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Scale or item Question/item wording Item parameters 
Delta Tau(1) Tau(2) 

T_PROFSTR How often have you participated in any of the following professional learning activities in the past two years? 

T2G17A A course on ICT applications (e.g. word processing, -0.62 -0.52 0.52 
presentations, internet use, spreadsheets, databases) 

T2G17B A course or webinar on integrating ICT into teaching -0.34 -0.55 0.55 
and learning 

T2G17C Training on subject-specific digital teaching and -0.45 -0.70 0.70 
learning resources 

T2G17H A course on use of ICT for [students with special needs 0.84 -0.24 0.24 
or specific learning difficulties] 

T2G17 A course on how to use ICT to support personalized 0.58 -0.31 0.31 
earning by students 

T_PROFREC How often did you use the following tools in your teaching of the reference class this school year? 

T2G17D Observations of other teachers using ICT in teaching -0.40 0.10 -0.10 

T2G17E An |CT-mediated discussion or forum on teaching and 0.39 0.19 -0.19 
earning 

T2G17F The sharing of digital teaching and learning resources -0.31 0.08 -0.08 
with others through a collaborative workspace 

T2G17G Use of a collaborative workspace to jointly evaluate 0.33 0.31 -0.31 
student work 


204 


Teachers’ perceptions of positive and negative outco 
learning 


ICILS 2018 TECHNICAL REPORT 


mes of using ICT for teaching and 


Question 18 of the ICILS 2018 teacher questionnaire asked teachers to provide their level of 


nu » 


agreement or disagreement (“strongly agree,’ “agree; 


« nu 


disagree,” “strongly disagree”) to a series 


of statements about positive and negative outcomes of using ICT for teaching and learning. Two 
scales were derived from the set of items: teachers' perceptions of positive outcomes when using ICT 
in teaching and learning (T_VWPOS) and teachers' perceptions of negative outcomes when using ICT in 


teaching and learning (T_VVWNEG). Higher scores on eith 
for each of these scales. 


Figure 12.18 illustrates the results of the confirmatory 


er scale reflects higher levels of agreement 


factor analysis assuming a two-dimensional 


model with items fromthe two scales. The model fit was satisfactory for the pooled dataset, however, 


with increasing constraints across different multiple-g 


roup models the fit became unsatisfactory, 


a finding which suggests a certain degree of variation in measurement characteristics. There was 
a moderately high negative correlation between the two latent factors (-O0.40). 


Figure 12.18: Confirmatory factor analysis of items measurin 
outcomes of using ICT for teaching and learning 
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Model fit indices: Pooled sample Multiple-group models 
Configural Metric Scalar 
RMSEA 0.063 0.094 N/A 0.092 
CFI 0.96 0.94 N/A 0.91 
TLI 0.95 0.93 N/A 0.93 


0.76 to 0.85 for the latter) (see Table 12.33). Table 12. 
the two scales that were used to derive the IRT scale s 


The average reliabilities (Cronbach's alpha) of the two scales across countries were 0.83 for 
T_VWPOS and 0.80 for T.VWNEG (ranging from 0.79 to 0.87 for the former, and ranging from 


34 shows the item parameters for each of 
cores. 
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Table 12.33: Reliabilities for scales measuring teachers’ perceptions of positive and negative outcomes of 
using ICT for teaching and learning 

Country Scale reliability (Cronbach's alpha) 

T._VWPOS T.VWNEG 

Chile 0.86 0.81 

Denmark 0.81 0.77 

Finland 0.79 0.76 

France 0.82 0.80 

Germany 0.82 0.80 

taly 0.86 0.85 

azakhstan 0.87 0.77 
orea, Republic of 0.83 0.82 

Luxembourg 0.80 0.80 

Moscow (Russian Federation) 0.84 0.82 

North Rhine-Westphalia (Germany) 0.83 0.81 

Portugal 0.84 0.82 

United States 0.87 0.81 

Uruguay 0.87 0.83 

ICILS 2018 average 0.83 0.80 
Notes: Benchmarking participants in italics. The |CILS 2018 average is based on data from the participating countries, 


excluding benchmarking participants. 


Table 12.34: Item parameters for scales measuring teachers’ perceptions of positive and negative outcomes 
of using ICT for teaching and learning 
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Scale or item Question/item wording Item parameters 
Delta Tau(1) Tau(2) Tau(3) 
T_VWPOS To what extent do you agree or disagree with the following practices and principles in relation to the use of ICT in 
teaching and learning? 
T2G18B elps students develop greater interest in learning. -0.67 -2.84 -1.01 3.85 
T2G18C elps students to work at a level appropriate to their -0.42 -3.51 -0.66 4.17 
earning needs. 
T2G18E elps students develop problem solving skills. 0.29 -3.66 -0.63 4.29 
T2G185 Enables students to collaborate more effectively. 0.27 -3.93 -0.49 4.42 
T2G18 elps students develop skills in planning and self- 0.61 -3.88 -0.41 4.29 
regulation of their work. 
T2G18L mproves academic performance of students. 0.73 -3.89 -0.41 4.30 
T2618 Enables students to access better sources of information. -0.82 -2.87 -1.06 393 
T_VWNEG To what extent do you agree or disagree with the following practices and principles in relation to the use of ICT in 
teaching and learning? 
T2G18A mpedes concept formation by students. 0.93 -2.64 0.82 1.82 
2G18D Results in students copying material from Internet -0.99 -3.05 -0.02 3.07 
sources. 
T2G18F Distracts students from learning. 0.33 -3.26 0.55 271 
T2G18G Results in poorer written expression among students. -0.25 -2.93 0.29 2.63 
2G18H Results in poorer calculation and estimation skills 0.09 -3.28 0.57 2.71 
among students. 
T2G18 Limits the amount of personal communication -0.10 -3.04 0.50 205) 


among students. 


206 


School questionnaires 


School principals’ use of ICT 


ICILS 2018 TECHNICAL REPORT 


The school principal questionnaire asked respondents to indicate how often they used ICT for 


different school related activiti 


es (for each item they could select from “never,” “less than once 


a month,” “at least once a month but not every week,’ “at least once a week but not every day,’ 
and “every day”). Nine of the items were used to derive the scale principals’ use of ICT for general 


school-related activities (P_ICTU 
scale principals’ use of ICT for sc 


hool-related commun 


on these two scales represent more frequent use of ICT. 


Aconfirmatory factor analysis a 
(see Figure 12.19) was satisfac 


ssuming a two-dime 


SE) and four of the remaining five items were used to derive the 
ication activities (P_ICTCOM). Higher scores 


nsional model with items from the two scales 
tory for the pooled ICILS 2018 sample. The two latent factors in 


the model were strongly correlated (0.70). Average reliabilities across countries for the two scales 


were satisfactory in most countries (see Table 12.35 
(ranging from 0.68 to 0.87) and P_LICTCOM had an average reli 
to 0.79). Table 12.36 records the IRT parameters for each of th 


Figure 12.19: Confirmatory factor analysis of items measuring school principals’ 


.P_ICTUSE had an average reliability of 0.79 
ability of 0.70 (ranging from 0.47 
e two scales. 
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Model fit indices: Pooled sample 
RMSEA 0.058 
CFI 0.92 
TLI 0.90 
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Table 12.35: Reliabilities for scales measuring principals’ use of ICT 


Country Scale reliability (Cronbach's alpha) 
P_ICTUSE P_ICTCOM 
Chile 0.85 0.74 
Denmark 0.77 0.54 
Finland 0.73 0.78 
France 0.73 0.76 
Germany 0.81 0.67 
taly 0.81 0.66 
azakhstan 0.80 0.81 
orea, Republic of 0.87 0.82 
Luxembourg 0.78 0.47 
Moscow (Russian Federation) 0.81 0.79 
North Rhine-Westphalia (Germany) 0.68 0.62 
Portugal 0.76 0.71 
United States 0.83 0.83 
Uruguay 0.78 0.59 
ICILS 2018 average 0.78 0.70 
Notes: Benchmarking participants in italics. The ICILS 2018 average is based on data from the participating countries, 


excluding benchmarking participants. 


Table 12.36: Item parameters for scales measuring principals’ use of [CT 


Scale or item Question/item wording Item parameters 
Delta Tau(1) Tau(2) Tau(3) 
P_ICTUSE How often do you use ICT for the following activities? 
P2G02B Provide information about an educational issue -0.15 -0.57 -0.66 1.24 
hrough a website 
P2GO02C Look up records in a database (e.g. in a student -0.91 -0.01 -0.56 0.57 
information system) 
P2G02D aintain, organize and analyze data (e.g. with a -0.55 -0.37 -0.43 0.80 
spreadsheet or database) 
P2GO2E Prepare presentations 0.49 -1.31 -0.30 1.61 
P2G02J Work with a learning management system (e.g. [Moodle]) 0.56 0.13 -0.36 0.23 
P2G02 Use social media to communicate with the wider 0.45 0.37 -0.81 0.44 
community about school-related activities 
P2GO02L anagement of staff (e.g. scheduling, professional -0.48 -0.46 -0.21 0.67 
development) 
P2GO2 Preparing the curriculum 0.47 -0.34 -0.32 0.66 
P2G02 School financial management 0.11 -0.18 -0.35 0.53 
P_ICTCOM How often do you use ICT for the following activities? 
P2G02F Communicate with teachers in your school -1.15 -0.03 -0.78 0.81 
P2G02G Communicate with education authorities -0.08 -1.19 -0.31 1.50 
P2G02H Communicate with principals and senior staff in other 0.43 -1.14 -0.52 1.65 
schools 
P2G02 Communicate with parents 0.80 -0.92 -0.59 1.51 
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School principals’ views on using ICT 
In Question 9 of the ICILS 2018 principal questionnaire, respondents were given seven different 


|CT-related outcomes of education in 


their school, a 
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nd were asked to rate their perceived level of 


importance (selecting from “very important,’ “quite important,” “somewhat important, and “not 


important”). The first six of the seven i 


views on using ICT for educational outcomes (P_VWICT 
levels of importance assigned to ICT-related skills a 


Figure 12.20 presents the results o 


model had a highly satisfactory fit for the pooled ICI 


reliabilities (Cronbach's alpha) across 
from 0.75 to 0.96 across countries) 
to derive the final scale scores. 


Figure 12.20: Confirmatory factor analysis of items meas 


. Table 12.38 sh 
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countries, wh 


tems inthe question were used to derive the scale principals’ 


). Higher scale scores corresponded to higher 
s an outcome of learning. 


atory factor analysis. The one-dimensional 
LS 2018 dataset. Table 12.37 shows the scale 
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ows the IRT item parameters that were used 
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Model fit indices: Pooled sample 
RMSEA 0.033 
CFI 1.00 
TLI 1.00 
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Table 12.37: Reliabilities for scale measuring principals’ views on using [CT 


Country Scale reliability (Cronbach’s alpha) 
P_VWICT 
Chile 0.96 
Denmark 0.81 
Finland 0.84 
France 0.89 
Germany 0.84 
taly 0.86 
azakhstan 0.85 
orea, Republic of 0.89 
Luxembourg 0.82 
Moscow (Russian Federation) 0.75 
North Rhine-Westphalia (Germany) 0.87 
Portugal 0.85 
United States 0.93 
Uruguay 0.92 
ICILS 2018 average 0.85 
Notes: Benchmarking participants in italics. The ICILS 2018 average is based on data from the participating countries, 


excluding benchmarking participants. 


Table 12.38: Item parameters for scale measuring principals’ views of using |CT 


Scale or item Question/item wording Item parameters 
Delta Tau(1) Tau(2) Tau(3) 
P_VWICT How important is each of the following outcomes of education in your school? 
P2GO9A The development of students’ basic computer skills -0.06 -2.61 -0.58 3.19 
(e.g. internet use, email, word processing, presentation 
software) 
P2GO09B The development of students’ skills in using ICT for 0.39 -4.10 0.30 3.80 
collaboration with others 
P2GO9C The use of ICT for facilitating students’ responsibility 0.66 -3.83 0.21 3.62 
for their own learning 
P2GO09D The use of ICT to augment and improve students’ 0.07 -4.16 0.22 3.94 
learning 
P2GO9E The development of students’ understanding and skills -0.67 -3.54 -0.01 3.55 
relating to safe and appropriate use of ICT 
P2GO9F The development of students’ proficiency in accessing -0.39 -3.74 -0.07 3.81 
and using information with ICT 
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School principals’ reports on expected ICT knowledge and skills of teachers 


In Question 11, school principals were asked whether teachers in their school were expected to 


acquire knowledge and skills in arrange of differen 
were asked to select either “expected and requ 
Data from eight items were used to derive th 


ired,’ “expected bu 
e scale principals’ 


t activities related to ICT. For each activity they 


t not required,’ or “not expected.” 
reports on expectations of ICT use 


by teachers (P_LEXPLRN) and data from the remaining three items provided the basis for the scale 


principals' reports on expectations for teacher co 
these two scales represent higher levels of ex 


Figure 12.21 provides the results of the cont 
the question. Here we can see that the two-dimensional mode 
taset. The results also showed a relative high positive correlation between the 
two latent factors (0.69). Table 12.39 displays the scale reliabi 
two scales. P_EXPLRN had an average reliability across countri 


for the pooled da 


0.89) whereas P_ 


pectations from th 


laboration using IC 


firm 


EXPTCH had an average reliability across cou 


T (P_EXPTCH). Higher scores on 
e principals. 


atory factor analysis of data from the items in 


had a marginally satisfactory fit 


ities (Cronbach's alpha) for the 
es of 0.78 (ranging from 0.66 to 


ntries of 0.70 (ranging from 0.59 


to 0.86). Table 12.40 shows the IRT item parameters that we used to derive the final scale scores 


for each of the tw 


o scales. 


Figure 12.21: Confirmatory factor analysis of items measuring principals’ reports on expected ICT knowledge 


and skills of teacher. 
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Model fit indices: Pooled sample 
RMSEA 0.083 

CFI 0.95 

TL 0.93 
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Table 12.39: Reliabilities for scales measuring principals’ reports on expected ICT knowledge and skills 
of teachers 


Country Scale reliability (Cronbach’s alpha) 
P_EXPLRN P_EXPTCH 
Chile 0.89 0.78 
Denmark 0.73 0.60 
Finland 0.74 0.63 
France 0.74 0.76 
Germany 0.77 0.64 
taly 0.75 0.65 
azakhstan 0.78 0.73 
orea, Republic of 0.89 0.86 
Luxembourg 0.81 0.86 
Moscow (Russian Federation) 0.76 0.69 
North Rhine-Westphalia (Germany) 0.66 0.72 
Portugal 0.69 0.59 
United States 0.81 0.65 
Uruguay 0.81 0.65 
ICILS 2018 average 0.77 0.71 


Table 12.40: Item parameters for scales measuring principals’ reports on expected ICT knowledge and skills 
of teachers 


Scale or item Question/item wording Item parameters 
Delta Tau(1) Tau(2) 
P_EXPLRN Are teachers in your school expected to acquire knowledge and skills in each of the following activities? 
P2G11A ntegrate Web-based learning in their instructional -0.54 -2.19 2.19 
practice 
P2G11B Use ICT-based forms of student assessment 0.04 -1.46 1.46 
P2G11C Use ICT for monitoring student progress -0.01 -1.66 1.66 
P2G11G ntegrate ICT into teaching and learning -1.97 -2.66 2.66 
P2G11H Use subject-specific digital learning resources -0.33 -2.19 2A9 
e.g. tutorials, simulation) 
P2G11 Use e-portfolios for assessment 1.60 -1.38 1.38 
P2G11J Use ICT to develop authentic (real-life) assignments 0.94 -2.06 2.06 
for students 
P2G11 Assess students’ [computer and information literacy] 0.27 -1.59 1.59 
P_EXPTCH Are teachers in your school expected to acquire knowledge and skills in each of the following activities? 
P2G11D Collaborate with other teachers via ICT -0.49 -2.70 2.70 
P2G11E Communicate with parents via ICT 0.17 -1.77 1.77 
P2G11F Communicate with students via ICT 0.32 -2.32 2.32 
P2GO02 Communicate with parents 0.80 -0.92 -0.59 


212 


ICILS 2018 TECHNICAL REPORT 


School principals’ reports on priorities for ICT use at schools 


School principals were asked in Question 15 to indicate the pri 


facilitating the use of ICT in 
priority,’ “medium priority, ‘ 
n 


the questio 
hardware (P_PRIORH). The d 
cipals’ reports on priorities 
e reflect higher levels of 


prin 


SCa 


( 


perceived priority. 


teaching and learning. For each item 
ow priority,’ or “not a priority.’ The data from t 
were used to derive the scale principals’ reports on priorities for facilitating use of ICT - 
ata from the remaining seven items were used 
for facilitating use of ICT - support (P_ 


Figure 12.22 depicts the results from the confirmatory factor analysis of 


the 
cor 
for 
cou 


cou 


to scale these items. 


ority given to different ways of 
they were asked to select “high 


he first three items in 


to derive the scale of 


PRIORS). Higher values on each 


the scaled items from 
question. There was a satisfactory fit for the two-factor model, and we 


found a high positive 


relation between the two latent factors (0.61). The reliabilities (Cronbac 
each country are presented in Table 12.41. On average, the reliability of PLPRIORH across 
ntries was 0.79 (ranging from 0.43 to 0.90), while the average reliability of P.PRIORS across 


h’s alpha) of the scales 


ntries was 0.84 (ranging from 0.77 to 0.92). Table 12.42 records the IRT item parameters used 


Figure 12.22: Confirmatory factor analysis of items measuring school principals’ reports on priorities for ICT 
use at schools 
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Model fit indices: 


Pooled sample 


RMSEA 0.071 
CFI 0.96 
TLI 0.95 
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Table 12.41: Reliabilities for scales measuring school principals’ reports on priorities for ICT use at schools 


Country Scale reliability (Cronbach's alpha) 
P_PRIORH P_PRIORS 
Chile 0.89 0.90 
Denmark 0.72 0.77 
Finland 0.73 0.79 
France 0.79 0.82 
Germany 0.68 0.86 
taly 0.76 0.83 
azakhstan 0.90 0.86 
orea, Republic of 0.88 0.92 
Luxembourg 0.88 0.82 
Moscow (Russian Federation) 0.83 0.82 
North Rhine-Westphalia (Germany) 0.43 0.85 
Portugal 0.78 0.82 
United States 0.80 0.84 
Uruguay 0.73 0.88 
ICILS 2018 average OWT, 0.84 
Notes: Benchmarking participants in italics. The |CILS 2018 average is based on data from the participating countries, 


excluding benchmarking participants. 


Table 12.42: Item parameters for scales measuring school principals’ reports on priorities for |CT use at schools 


Scale or item Question/item wording Item parameters 
Delta Tau(1) Tau(2) Tau(3) 
P_PRIORH At your school, what priority is given to the following ways of facilitating the use of ICT in teaching and learning? 
P2G15A creasing the numbers of computers per student in 0.09 -1.40 -0.49 1.88 
e school 
P2G15B creasing the number of computers connected to 0.12 -0.82 -0.61 1.43 
e Internet 
P2G15C creasing the bandwidth of Internet access for the -0.21 -0.70 -0.69 1.39 
computers connected to the Internet 
P_PRIORS At your school, what priority is given to the following ways of facilitating the use of ICT in teaching and learning? 
P2G15D creasing the range of digital learning resources -0.99 -1.95 -0.36 2.32 
available for teaching and learning 
P2G15E Establishing or enhancing an online learning support 0.24 -1.62 -0.25 1.87 
platform 
P2G15F Supporting participation in professional development -0.61 -1.83 -0.18 2.01 
on pedagogical use of ICT 
P2G15G ncreasing the availability of qualified technical 0.11 -1.22 -0.15 1.37 
personnel to support the use of ICT 
P2G15H Providing teachers with incentives to integrate ICT use 0.26 -1.16 -0.34 154 
in their teaching 
P2G15l Providing more time for teachers to prepare lessons 1.43 -1.28 -0.18 1.46 


in which ICT is used 


P2G15J ncreasing the professional learning resources for -0.15 -1.70 -0.38 2.08 
eachers in the use of ICT 
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Availability of digital resources at school 

Questions 4 and 5 of the school ICT coordinator questionnaire, asked respondents to list the 
availability of different technology and software resources in their school with the response options: 
“available to teachers and students,’ “available only to teachers,’ “available only to students,’ or “not 
available.” Thirteen items across the two questions were used to derive the scale ICT coordinators 
reports on availability of ICT resources at school (C_ICTRES). Higher scores correspond to greater 


availability of |CT-related resources. 


Figure 12.23 presents the results of the confirmatory factor analysis. The one-dimensional model 
had ahighly satisfactory fit. Table 12.43 shows the scale reliabilities (Cronbach's alpha) which had 
arelatively satisfactory average reliability across countries (0.74, ranging from 0.57 to 0.80). Table 
12.44 shows the IRT parameters that were used to derive the scale scores. 


Figure 12.23: Confirmatory factor analysis of items measuring schoo! ICT coordinators’ reports on the 
availability of digital resources at school 
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Model fit indices: Pooled sample 
RMSEA 0.048 
CFI 0.89 
TLI 0.87 
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Table 12.43: Reliabilities for scale measuring school ICT coordinators’ reports on the availability of digital 
resources at school 


Country Scale reliability (Cronbach's alpha) 
C_ICTRES 
Chile 0.80 
Denmark 0.68 
Finland 0.71 
France 0.71 
Germany 0.79 
taly 0.78 
azakhstan 0.80 
orea, Republic of 0.79 
Luxembourg 0.62 
Moscow (Russian Federation) 0.78 
North Rhine-Westphalia (Germany) 0.57 
Portugal 0.74 
United States 0.70 
Uruguay 0.75 
ICILS 2018 average OF 


Notes: Benchmarking participants in italics. The ICILS 2018 average is based on data from the participating countries, 
excluding benchmarking participants. 


Table 12.44: Item parameters for scale measuring ICT coordinators’ reports on the availability of digital 
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resources at school 
Scale or item Question/item wording Item parameters 
Delta Tau(1) Tau(2) 
GNGIRES Please indicate the availability of the following technology resources in your school. 
2G04B Digital learning resources that can only be used online -0.94 1.14 -1.14 
2G04C Access to the Internet through the school network -1.69 0.07 -0.07 
2G04D Access to an education site or network maintained by -0.29 0.52 -0.52 
education authorities 
2G04E Email accounts for school-related use -0.24 -0.49 0.49 
2GO5A Practice programs or [apps] where teachers decide 0.17 0.93 -0.93 
which questions are asked of students 
e.g. [Quizlet, Kahoot], [mathfessor]) 
2G05B Single user digital learning games (e.g. [languages online]) 0.35 1.48 -1.48 
2G05C ulti-user digital learning games with graphics and 1.16 1.68 -1.68 
inquiry tasks (e.g. [Quest Atlantis]) 
2GO5F Video and photo software for capture and editing -0.96 1.10 -1,10 
e.g. [Windows Movie Maker, iMovie, Adobe Photoshop]) 
2G05G Concept mapping software (e.g. [Inspiration ®], 0.57 1.39 -1.39 
Webspiration ®)) 
2GO05H Data logging and monitoring tools (e.g. [Logger Pro]) 1.47 1.09 -1.09 
hat capture real-world data digitally for analysis 
e.g. speed, temperature) 
2G055 A learning management system (e.g. [Edmodo], 0.00 1.21 -1.21 
Blackboard]) 
2G05L e-portfolios (e.g. [VoiceThread]) 0.83 1.25 -1.25 
2G05M Digital contents linked with textbooks -0.42 0.56 -0.56 
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Hindrances to the use of ICT for teaching and learning at school 


In Question 13 of the ICILS 2018 ICT coordinator questionnaire, respondents were asked to 


indicate the extent that teachi 


ng and learning at their school is hindered 


related obstacles. For each of the 14 obstacles, they could rate their impact as “a 
extent,’ “very little,’ or “not at all.” Data from the fi 
the scale ICT coordinators reports on computer resource hindrances to the use of ICT in teaching and 


learning (C_HINR 


ES). 


Data from six of the remai 


rst six items in the quest 


ning eight items were u 


ionwere u 


by different resource- 


ot,” “to some 
sed to derive 


sed to derive the scale 


ICT coordinators reports on pedagogical resource hindrances to the use of ICT in teaching and learning 


(C_HINPED). Hig 


her scale scores corresponded 


the use of ICT for teaching and learning in schoo 


to greater perceived hin 
S. 


Figure 12.24 illustrates the results of the confirmatory factor analysis assu 
model with the scaled items. The model had asa 


and there was a strong correla 


drances o 


f obstacles to 


ming a two-dimensional 
tisfactory fit for the pooled ICILS 2018 dataset, 
tion between latent factors (0.64). 


Figure 12.24: Confirmatory factor analysis of items measuring ICT coordinators’ reports on hindrances to the 
use of ICT for teaching and learning at school 
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Model fit indices: 


Pooled sample 


RMSEA 0.060 
CFI 0.96 
TLI 0.95 


As evident in Table 12.45, the scale reliabilities for both scales were high. C_HINRES had an 
average reliability across countries of 0.81 (ranging from 0.69 to 0.88) and C_HINPED had an 
average reliability across countries of 0.79 (ranging from 0.68 to 0.91). Table 12.46 shows the 
item parameters for the scales that were used to derive the IRT scale scores. 
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Table 12.45: Reliabilities for scales measuring ICT coordinators’ reports on hindrances to the use of ICT for 
teaching and learning at school 


Country Scale reliability (Cronbach’s alpha) 
C_HINRES C_HINPED 

Chile 0.85 0.83 
Denmark 0.80 0.81 
Finland 0.72 0.68 
France 0.81 0.80 
Germany 0.76 0.73 
taly 0.81 0.69 
azakhstan 0.88 0.86 
orea, Republic of 0.86 0.91 
Luxembourg 0.69 0.69 
Moscow (Russian Federation) 0.79 0.88 
North Rhine-Westphalia (Germany) 0.70 0.76 
Portugal 0.81 0.86 
United States 0.88 0.88 
Uruguay 0.84 0.77 
ICILS 2018 average OO 0.79 


Notes: Benchmarking participants in italics. The ICILS 2018 average is based ond 


excluding benchmarking participants. 


ata from the participating countries, 


Table 12.46: Item parameters for scales measuring ICT coordinators’ reports on hindrances to the use of |CT 
for teaching and learning at school 
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Scale or item Question/item wording Item parameters 
Delta Tau(1) Tau(2) Tau(3) 
C_HINRES To what extent is the use of ICT in teaching and learning at your school hindered by each of the following obstacles? 
2G13A Too few computers with an Internet connection 0.76 -0.57 -0.51 1.08 
2G13B nsufficient Internet bandwidth or speed -0.26 -0.68 -0.18 0.85 
2G613C ot enough computers for instruction -0.10 -0.69 -0.40 1.08 
2G613D Lack of sufficiently powerful computers -0.23 -0.99 -0.08 1.07 
2G13E Problems in maintaining ICT equipment -0.16 -1.30 0.08 1.22 
2G13F ot enough computer software 0.00 -1.47 0.04 1.43 
C_HINPED To what extent is the use of ICT in teaching and learning at your school hindered by each of the following obstacles? 
2G613G nsufficient ICT skills among teachers -0.26 -1.97 -0.29 2.26 
2G13H nsufficient time for teachers to prepare lessons -0.24 -1.50 -0.26 1.77 
2613 Lack of effective professional learning resources for -0.10 -1.65 -0.18 1.83 
eachers 
26135 Lack of an effective online learning support platform 0.35 -1.46 0.02 1.44 
2613 Lack of incentives for teachers to integrate ICT use -0.03 -1.39 -0.25 1.64 
in their teaching 
2613 nsufficient pedagogical support for the use of ICT 0.28 -1.64 -0.22 1.86 
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CHAPTER 13: 
The reporting of ICILS 2018 results 


Wolfram Schulz 
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Table 13.1: Number of jackknife zones in national samples 
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Country Student data Teacher data School data 
Chile 75 15 75 
Denmark 72 69 72 
Finland 74 73 73 
France 75 65 75 
Germany 75 75 75 
Italy 75 74 73 
Kazakhstan 75 75 75 
Korea, Republic of 75 74 75 
Luxembourg 38 28 35 
Moscow (Russian Federation) 75 75 75 
North Rhine-Westphalia (Germany) 55 54 56 
Portugal 75 75 75 
United States VAs) 75 75 
Uruguay 75 62 ies) 


Note: Benchmarking pa 


Within each of the sa 
the other school ava 


rticipants in italics. 


ue of O. For each of 


mpling zones, we ran 


the sampling zo 


domly assigned one school a multiplication value of 2 and 
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mputed 75 replicate weights regardless of the number of zones. In 


zones, the remaining replicate weights were set equal to the original sampling 


ute to the sampling variance estimate. 


a statistic, pl, involves computing it once with the sampling 


nal sample and then with each of the 75 replication weights separately. The 


nputed using the formula: 


Here, is the statistic estimated for the population through use of the original sampling weights 
and pi, is the same statistic estimated by using the weights for the i of 75 jackknife replicates. The 
standard error SE, for statistic u, which reflects the uncertainty of the estimate due to sampling, 


iS computed as: 


SE, =VSV, 


The computation of sampling variance using jackknife repeated replication can be obtained for any 
statistic, including means, percentages, standard deviations, correlations, regression coefficients, 


and mean difference 


5. 
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Table 13.2: Example for computation of replicate weights 
ID Student School Jackknife Jackknife | Multiplication} Replicate Replicate Replicate 
weight zone replicate code value weight 1 weight 2 weight 3 
1 5.2 A O O O ee 5.2 
2 5.2 A O O O 5.2 5.2 
3 5.2 A 1 O O O 5.2 52 
4 52 A O O O Die 5.2 
5 9.8 B a 2 19.6 9.8 9.8 
6 9.8 B 1 1 2 NON) 98 9.8 
7 9.8 B 1 2 19.6 98 9.8 
8 9.8 B 1 2 19.6 9.8 9.8 
9 6.6 Cc 2. 1 2 6.6 AGL 6.6 
10 6.6 Cc 2 iL 2 6.6 2 6.6 
1 6.6 Cc 2 iL 2 6.6 a2 6.6 
2 6.6 G 2 iL 2 6.6 AS 6.6 
3 7.2 D 2 O O Ia O 7.2 
4 7.2 D 2 O O 7.2 O 7.2 
5 72 D 2 O O LD O 72 
6 2. D 2 O O 72 O 72 
7 49 E 3 al 2 49 49 9.8 
8 49 E 3 ol 2 49 49 98 
7 49 E 3 L 2 49 49 98 
20 49 E 3 1 2 49 49 9.8 
21 8.2 F 3 O O 8.2 8.2 O 
22 8.2 F 3 O O 8.2 8.2 O 
23 8.2 F 3 O O 8.2 8.2 O 
24 8.2 F 3 O O 8.2 8.2 O 


Standard statistical software do n 
ICILS 2018, we used tailored Stati 


(IBM Corp 2017). These results can be 


Analyzer, which is generally recommend 
can use other specialized software, such as WesVar (Westat 2007), tailored appli 
SPSS Replicates Module developed by AC 
as Stata (StataCorp 2013) or SAS (SAS 
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nstitute Inc. 2017). 


ot always include procedures for replication 
stical Package for the Social Sciences (SPSS) software macros 
replicated by using the | 
ed as atool for analyzing | 


EA International 


techniques. For 


Database (IDB) 
EA data.! Alternatively, analysts 


cations like the 
tin to statistical software such 


1 The !DB Analyzer is an application that allows the user to combine and analyze data from IEA's large-scale assessments 


such as TIMSS, PIRLS, ICCS, and ICILS by creating SPSS or SAS Syntax, which can be used with the respecting statistic 
software. The application can be downloaded at https://www.iea.n|/data-tools/tools. 


2 The module is an add-in component running under SPSS and offers a number of features for applying different replica- 
tion methods when estimating sampling and imputation variance. The application can be downloaded at https://iccs. 


acer.org/iccs-2016-reports/. 
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Estimation of imputation variance for CIL and CT scores 


The estimation of sampling variance as described above is sufficient for any analysis not involving 
test scores for ClL or CT. When estimating standard errors of estimates involving test scores for 
ClLand CT, it is important to additionally take the imputation variance into account, which provides 
an estimate of measurement variance (see Chapter 11 for a description of the scaling methodology 
for ICILS 2018 test items). Therefore, population statistics and their errors for ICILS 2018 CIL 
and CT scores should always be estimated using all five plausible values. 


If Bis the international ClLor CT score and uf is the statistic of interest computed on each plausible 
value P, then the statistic p, based on all plausible values can be computed as follows: 


The sampling variance SV,, is calculated as the average of the sampling variance for each plausible 
value SV/,: 


de P 
SV, = P2, SV), 


Use of the P plausible values for data analysis also allows an estimation of the amount of error 
associated with the measurement of ClL and CT. The measurement variance or imputation variance 
V,, is computed as: 

1 P 
aes 2, Wate)? 
Here, 1/8 is the statistic of interest computed on each plausible value p and i, is the mean statistic 
based on all P plausible values. 


The estimate of the total variance TV,,, consisting of sampling variance and imputation variance, 
can be computed as: 


TV,,=SV,,+ 14+ 5)IV, 


The estimate of the final standard error SE, is equal to: 
SE, =vTV, 


The following formula illustrates the whole process of the computation of standard errors for a 
statistic (u) based on P plausible values for 75 replicates, where ue is the statistic for the i replicate 
of the pth plausible value, and py), is the statistic for the p” plausible value with the original sampling 
weights: 


Table 13.3 shows the average scale scores for CIL as well as their sampling and overall standard 
errors, while Table 13.4 displays the corresponding information for CT. The tables also record 
the number of students that were assessed in each country. The comparison between sampling 
and combined standard error shows that for both assessment domains most of the error was 
due to sampling and that (at the level of national samples) only a relatively small proportion was 
attributable to measurement error. 
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Table 13.3: National averages for CIL with standard deviations, sampling, and overall errors 


Country Average Sampling Combined Number of 
ele error standard assessed 
score error students 
Chile 476 3.71 3.73 3092 
Denmark 553 2.00 2.04 2404 
Finland 531 2.96 2.98 2546 
France 499 2.26 DBS) 2940 
Germany 518 2.81 2.95 3655 
taly 461 2.69 DIS) 2810 
azakhstan 395 5.33 DoT 3371 
Korea, Republic of 542 2.95 B05 2375 
Luxembourg 482 0.68 0.83 5401 
Moscow (Russian Federation) 549 2271 225 2852 
North Rhine-Westphalia (Germany) 515 2.55 2.63 1991 
Portugal 516 DSS 2U5O) S22) 
United States* 519 1.86 1.89 6790 
Uruguay 450 4.19 4.29 Pons 
Notes: Benchmarking participants in italics. 
* Countries not meeting sample participation requirements. 
Table 13.4: National averages for CT with standard deviations, sampling, and overall errors 
Country Average Sampling Combined Number of 
CT error standard assessed 
score error students 
Denmark 527 2.29 2.32 2404 
Finland 508 Boe BS 2546 
France 501 2.31 2.38 2940 
Germany 486 3.54 3.63 S655) 
orea, Republic of 536 4.26 4.42 2875 
Luxembourg 460 0.86 0.89 5401 
North Rhine-Westphalia (Germany) 485 2.87 2.96 1991 
Portugal 482 2.48 Pasall S224 
United States* 498 2.48 2.54 6790 


Notes: Benchmarking participants in italics. 


* Countries not meeting sample participa 


Reporting of differences 


ion requirements. 


Differences in population estimates between and within countries 


We considered differences between two score averages (or percentages) a and b significant (p < 
0.05) when the test statistic t was greater than the critical value, 1.96. We calculated t by dividing 


the difference by its standard error, SE gi¢ ap: 


(a-b) 


t = 
SE ai ab 
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Inthe case of differences between score averages from independent samples (evident, for example, 
with respect to comparisons of two country averages), the standard error of the difference SE gig ab 
can be computed as: 


SE gif ab = WSE% + SES 


Here, SEg and SEp are the standard errors of the means from the two independent samples a and b. 


The formula for calculating the standard error provided above is only suitable when the subsamples 
being compared are independent. Because subgroups (e.g., gender groups) within countries are 
typically not independent samples, we derived the difference between statistics for subgroups 
of interest and the standard error of the difference by using jackknife repeated replication that 
involved the following formula: 


75 
SE aig ab -| bs (a'-b)-(a-b))* 


i= 
Here, a and b represent the averages (or percentages) in each of the two subgroups for the fully 
weighted sample, and a! and b' are those for the replicate samples. 


Inthe case of differences in ClL and CT scores between dependent subsamples, we calculated the 
standard error of the differences with (P = 5) plausible values by using this formula: 


| > ( Fa'-b) 1, 29h (la,-b,)-(G,-,)? 
SE diab = EZ { $(a,-bi)-(a,-b,) lag (1+4) ee 


\ 


Here, a, and b, represent the weighted subgroup averages in groups a and b for each of the P 
plausible values, a’ and p, are the subgroup averages within replicate samples for each of the P 
plausible values, and a, and b, are the means of the two weighted subgroup averages across the 
P plausible values. 


Comparisons between countries and ICILS averages 


The standard error of the ICILS 2018 average SE, 291g was calculated based on the respective 
standard error for each of the national statistics (SE,) and the number (N) of countries meeting 
IEA sample participation requirements that were included in the average (11 for student survey 
results, 7 for teacher survey results): 


D1 SEZ 
SEu 2018 = ra ‘ 


When comparing the country means c with the overall ICILS 2018 average i, we had to account 
for the fact that the country being considered had contributed to the international standard error. 
We did this by calculating the standard error SEgig j. of the difference between the overall ICILS 
2018 average and an individual country average as: 


SE dit ic = Vi Nea W)PoASe to oe 
N 


Here, SE, is the sampling standard error for country c and SE, is the sampling error for k‘" of 
N participating and reported countries. We used this formula for determining the statistical 
significance of differences due to sampling error between countries and the ICILS 2018 averages 
of all questionnaire percentages or scale point averages throughout the ICILS 2018 reports. 
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When comparing the CIL and CT score averages of a country with the overall |CILS 2018 average, 
it was necessary to also account for the imputation component of standard errors for countries 
into account. The imputation variance component of standard errors BE, difien was given as: 


il 
SE? dicey = of 14 B) VOM A dnd 


Here, qd. is the difference between the overall ICILS 2018 average and the country mean for the 
plausible value p. 


The sampling error for ClL and CT scores was calculated as follows: 


= 


1 
I; > pest SE] + yet a SE3,| 
N 


feos 


SE dig icp = 
Here, SE, is the sampling standard error for country c and plausible value p, and SE;,, is the sampling 


error for plausible value p in the k* of N participating and reported countries. 


We computed the final standard error (SE gi icp) of the difference between national CIL and CT 
country test scores and the ICILS 2018 averages as: 


SE a dif icp =| SE’ sie ict SES. ait ton 


Comparisons between benchmarking participants and ICILS averages 


When comparing averages for benchmarking participants, Moscow (Russian Federation) 
constituted an independent student sample in relation to the ICILS 2018 average while North 
Rhine-Westphalia was a sub-sample of the German national student sample (representing 
22.5% of the corresponding student population). Therefore, two different formulas had to be 
applied. For the teacher survey, the German teacher survey did not meet IEA sample participation 
requirements and was therefore not included inthe ICILS 2018 average for results from the teacher 
survey. Therefore, North Rhine-Westphalia, which had met sample participation requirements 
as a benchmarking participant, constituted an independent sample in relation to the ICILS 2018 
average for teacher results. 


Standard errors for differences between ICILS 2018 averages and Moscow (Russian Federation), 
both for student and teacher results, as well as North Rhine-Westphalia (Germany) with regard 
to teacher results were computed with the following formula: 


SE git ic = «f SEbp + SE{, 2018 


Here, SEp, is the standard error for the result from the benchmarking participant, and SE,, 2018 IS 
the standard error for the ICILS 2018 average. 


For differences between student (or school) survey results in North Rhine-Westphalia (Germany) 
and the ICILS 2018 average, the standard error SE gig straw Was Computed as follows: 


= : J (N-1)2-0.225)SE2 NRW D5e1SE2+ TN, +SEy 
dif_StNRW. N 


Here, SEcipw represents the standard error for student survey estimate from North Rhine- 
Westphalia and SE, is the sampling error for k“ of N participating countries. The correction for the 
contribution of the data from this benchmarking participant was set to 0.225 (instead of 1) as its 
student population was equivalent to 22.5 percent of the overall German population. 
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Comparisons between CIL results in 2018 and 2013 


For those countries that had participated in the previous cycle, the ICILS 2018 internationa 
report also included comparisons of test results for CIL between ICILS 2013 and 2018. Because 
the process of equating the ClL scores across the cycles introduced some additional error into the 
calculation of any test statistic, we added an equating error term to the formula for the standard 
error of the difference between country averages. 


When testing the difference of a statistic between the two assessments, we computed the standard 
error of the difference as follows: 


SE(Wacits1g) ~ Hiicits13)) = F SEnicisiet SE Se2picusig+ EqErr? 


Here, can be any statistic in units on the equated ICILS scale (mean, percentile, gender difference, 
but not percentages) and SE, icis1g and SE,iciisig are the respective standard errors of this statistic 
from the two surveys. EgErr denotes the equating error that reflects the uncertainty in the link 
between both assessments, which was equal to 3.9 score points for the CIL scale (see Chapter 11 
for the calculation of the respective equating error). 


To report the significance of differences between ICILS 2018 and 2013 for percentages of students 
with CIL scores at or above Level 2 (see Chapter 1 for a description of these levels), it was not 
possible to use the estimated equating error in ClL score points. Therefore, we applied the following 
replication method to estimate the corresponding equating errors. 


between Levels 1 and 2 (492), within each participating country anumber of nreplicate cut-point 
were generated by computing the observed threshold plus arandom error component with amea 
of O and astandard deviation equal to the estimated equating error (3.9). Percentages of student 
at or above each replicate cut-point (p_) were computed for each of these replicated thresholds 


n 


and the equating error for each participating country was estimated as: 


To estimate the standard error of the percentage at or above the cut-point that defines the threshold 
n 


S 
S 


(P,P. 
EquErr p_country — N = 


Here, p, is the observed percentage of students at or above Level 2. We used 1000 replicate 
samples (Nop) for these computations. The equating errors for the national percentages of students 
at or above Level 2 were estimated as 1.95 for Chile, 1.23 for Denmark, 1.45 for Germany, and 
1.18 for Korea. 


Within each participating country, the standard errors for the differences between percentages 
at or above proficient levels were calculated based on the standard error on the percentage at 
or above Level 2 in 2018 (SEpicitsi8); the standard error for the corresponding estimate in 2013 
(SEpicits13), and the estimated equating error (EquError country) as follows: 


SE(Pucis1s)~ Pucis13) = ¥ SEpicusiet SEpicusiat EQUENTpcountry 


Multiple regression modeling of teacher data 


When reporting ICILS 2018 data, we also used single-level multiple regression models to explain 
variation in the questionnaire scale scores reflecting teachers’ emphasis on teaching CIL and 
CT, respectively. Predictor variables were teachers’ ICT self-efficacy, positive perceptions of 
pedagogical ICT use, perceptions of higher levels of teacher collaboration, and reports on higher 
levels of availability of ICT resources at their school and teachers’ experience with using ICT 
during lessons. 
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Multiple regression models (see, for example, Pedhazur 1997) were estimated as: 


Y¥i= Bot BX7+ 8; 


Here, for sample of i teachers we regressed our criterio 


n variables Y; (teachers’ emphasis on 


teaching CIL or CT) on a vector of predictors X/ with its corresponding vector of regression 


coefficients B;, where Bo denotes the intercept, and ¢; re 


model (residual). 


We reported the unstandardized regression coefficients and the variance explained by the mode 
r and the overall explanatory power of the model. To estimate 


to show the effects for each predicto 


standard errors for the mu 


Table 13.5 shows the num 


ltiple reg 


bers of al 


each of the multiple regression anal 


valid data for all variables in each of 


the two estimated models. 


ression model parameters, we employed jackknife repeated 
replication using tailored SPSS macros, which can be exactly replicated with the IEA IDB Analyzer. 


assessed teachers in each country, of teachers included in 
yses, as well as the weighted percentages of teachers with 


presents the unexplained part of the 


Table 13.5: ICILS 2018 teachers included in multiple regression analyses of teachers’ emphasis on 
teaching CIL- and CT-related skills 


Country Multiple regression analysis of Multiple regression analysis of 
teachers' emphasis on teachers' emphasis on 
ClLrelated skills CT-related skills 
Total number Weighted Total number Weighted 
of teachers percentage of teachers percentage 
in analysis of teachers in analysis of students 
in analysis in analysis 
Chile 1625 96 1622 96 
Denmark 1081 Oi 1078 A 
Finland 1797 97 1795 97 
France 1405 96 1400 96 
Germany 2212 94 2207 94 
taly 2534 OH AZAD Oi 
azakhstan 2534 96 2535 96 
orea, Republic of 2068 97 2048 96 
Luxembourg 473 96 472 96 
Moscow (Russian Federation) 2189 97 2185 97 
North Rhine-Westphalia (Germany) 1398 94 1400 95 
Portugal 2743 OH, 2750 98 
United States* 3103 96 3103 96 
Uruguay 1178 90 1176 90 
Notes: Benchmarking participants in italics. 
* Countries not meeting sample participation requirements. 


Across the participating countries, we observed an average percentage of teachers in the sample 
with valid data for all variables of 96 percent. National average percentages of teachers with valid 
data for all variables ranged from 90 percent in Uruguay to 98 percent in Portugal.* 


3 Readers should note that when applying models with a larger number of predictor variables, it is likely that the pro- 
portion of missing values increases when applying a (list-wise) exclusion of respondents with omitted data for any of 
the variables in the analysis. 
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Hierarchical linear modeling to explain variation in students’ CIL and CT 


To review which factors are associated with variation in CIL and CT within and across schools 
within participating countries, we estimated within each country hierarchical (or multilevel) linear 
regression models (Raudenbush and Bryk 2002) in which students were nested within schools. 
Predictor variables included variables reflecting students’ personal and social background, ICT- 
related variables at the student level, and |CT-related factors at the school level. 


A hierarchical regression model with i students nested inj clusters (schools) can be estimated as: 
Yii= Bot ByXij + BX} + Ug +e; 
Here, Yi is the criterion variable, Bo is the intercept, Xj is a vector of student-level variables, with 
its corresponding vector of regression coefficients Bj, and Xmj is a school-level variable with its 
corresponding vector of regression coefficients Bj. Ugjis the residual term at the level of the cluster 


(school), and e; is the student-level residual. Both residual terms are assumed to have a mean of O 
and variance that is normally distributed at each level. 


The explained variance in hierarchical linear models has to be estimated for each level separately, 
with the estimate based on a comparison of each prediction model with the baseline (“null”) model 
(or ANOVA model) without any predictor variables. 


We estimated the null model, from which we excluded students with missing data after completing 
“missing treatment” (see section on missing treatment below) as: 


Yig= Bo + Yoynuiy * €i (nul) 
The residual term Upjy,,) Provides an estimate of the variance in Yj; between j clusters, and €j _,)) 
is an estimate of the variance between i students within clusters. The intra-class correlation IC, 
which reflects the proportion of variance between clusters (in our case, schools), can be computed 
from these estimates as: 


Upj (null) 


IC= 
Yojnuty + 8 ( 


null) null) 


Based on the estimates of variances at school and student level derived from the null model, we 
computed the explained variance at the school level EV; as: 


Uoj 


ih x100 
Upj 


EV,= 


null) 

We computed the explained variance at the student level EV; as: 
ej 

© ij (null) 


EVii = x100 


ie 


Because multilevel modeling takes the hierarchical structure of the cluster sample into account 
and we used plausible values as estimates of students’ CIL and CT in our models, the reported 
multilevel standard errors reflect both sampling and imputation errors. 


National data were weighted with normalized school-level and [within-school] student-level 
weights, where the sum of within- and between-school weights is equal to the sample size 
(Asparouhov 2006). Normalized (or scaled) within-school student-level weights wj were calculated 
as: 
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Here, nj represents the cluster size (equal to the number of students with valid data within each 
school) and ow, the original within-school student-level weights. Normalized (or scaled) school-level 
weights wjwere computed as: 


n 
dij Oj Wj 


ow? 


jj 


Here, nis the total sample size and w; denotes the original school level weights. 


uthén and Muthén 2012) to estimate all 
the sample participation requirements 


We used the software package MPlus (Version 7, see 
hierarchical models. Even though Luxembourg had met 
for the student survey and 38 out of 41 of its schools with target grade students participated, we 
excluded their data from the analyses; the low number of schools would have resulted in a greatly 
reduced statistical power and arather limited precision of estimation of school level effects. Results 
from the United States, which did not meet IEA sample participation requirement, were reported 
separately and should be interpreted with caution. 


As is customary when applying multivariate analyses, we observed increases in the proportions 
of missing data when including more variables in the model. To account for higher proportions of 
missing responses for some of the variables with higher percentages of missing values, the analysis 
included a “dummy variable adjustment” for these data (see Cohen and Cohen 1975). For each of 
ese variables, we assigned mean or median values to cases with missing data and added dummy 
indicator variables (with 1 indicating a missing value and O non-missing values) to the analysis. 


er proportions of missing values for the scale reflecting use 
d the scales measuring students’ perceptions of learning of 
L- or CT-related skills (which were included as predictor variables for explaining CIL and CT 
respectively). Given that information from teachers tended to be either not missing or missing (in 
almost all cases) for both predictor variables (average years of teachers’ experience with ICT use 
during lessons and teachers’ reports on students’ ICT use for class activities) from this survey, only 
one missing indicator was created to indicate missing teacher data for both variables. Two further 
missing indicators were used for missing school principal data regarding schools’ expectations of 
teacher communication via ICT, and for missing ICT coordinator data about the availability of ICT 
resources at school. 


Table 13.6 shows the coeft 
and CT in each country. St 


t the student level we observed high 
f general ICT applications in class an 


ficients for missing indicators included in the multilevel analyses of CIL 
udent-level missing indicators tended to be negatively related with CIL 
and CT respectively while patterns were less consistent for school-level predictors derived from 
teacher, principal, and ICT coordinator surveys. In countries where there were no schools with 
missing information from either school or teacher questionnaires, it was not necessary to apply 
any treatment. Consequently, no missing indicator coefficients are displayed in this table (as in 
Korea for school principal and ICT coordinator questionnaire data, in Kazakhstan for teacher 
information, and in Moscow for all school-level data). 


Table 13.7 shows 
respective weighte 


uded in the multilevel analyses, as well as the 


th valid data for all variables in the model. 


the numbers of students inc 
d percentages of students wi 


evela tuden 8 countries 


For th 
meeti 
Germ 


validd 
with d 


emulti 
ng samp 
any and 
atafor | 


ue cauti 


alyses of CIL, 92 percent of s 


nclusi 
on. For the analysis of variation in CT, on 


data for all variables in the model. However, in German 


could be included in these analyses with similar caveats t 


the respective results. 


ts, onaverage across |ICILS 201 
e participation requirements, had valid da 
Uruguay, however, only 84 and 83 percent 
oninthe analyses of CIL, and conseq 


he model. In 
sample had 


ta for all variables included int 
(respectively) of the weighted 
uently their results should be interpreted 
average 94 percent of students had valid 
y, only 84 percent of the weighted sample 


hat should be observed when interpreting 
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Table 13.7: ICILS 2018 students included in multilevel analyses of variation in CIL and CT 


Country elle Gl 
Number of Weighted % Number of Weighted % 
students of students students of students 
in analysis in analysis in analysis in analysis 
Chile 2888 93 
Denmark DSA) Oy ZNO) Oy 
Finland 2453 97 2453 97 
France 2686 We 2686 92 
Germany 2983 84 2983 84 
Italy ONT Os 
Kazakhstan 3169 95 
Korea, Republic of 2786 97 2786 97 
Moscow (Russian Federation) 2718 95 
North Rhine-Westphalia (Germany) 1562 79 i Syo? 79 
Portugal 3081 96 3081 96 
United States* 5829 86 5829 86 
Uruguay 2126 83 


Notes: Benchmarking participants in italics. 


* Countries not meeting sample participation requirements. 
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APPENDIX A: 


Organizations and individuals involved 
in ICILS 2018 


International study center 


T 
ag 


The international study center is located at the Australian Council for Educational Research (ACE 
ACER is responsible for designing and implementing the study in close cooperation with IEA. 


Staff at ACER 

Julian Fraillon, research director 

John Ainley, project coordinator 

Wolfram Schulz, assessment coordinator 

Tim Friedman, project researcher 

Daniel Duckworth, test developer 
elissa Hughes, test developer 

Laila Helou, quality assurer 

Alex Daraganov, data analyst 

Renee Kwong, data analyst 

Leigh Patterson, data analyst 

Louise Ockwell, data analyst 

Katja Bischof, project researcher 


International Association for the Evaluation of Educational Achievement (IEA) 


EA provided overall support in coordinating and implementing ICILS 2018. IEA Amsterdam, the 

etherlands, was responsible for membership, translation verification, quality control, and the 
publication and wider dissemination of the report. IEA Hamburg, Germany, was mainly responsible 
for managing field operations, sampling procedures, and data processing. 


Staff at IEA Amsterdam 


Dirk Hastedt, executive director 
Andrea Netten, director IEA Amsterdam 
Roel Burgers, financial director 
ichelle Djeki¢, research and liaison officer (former) 
Sandra Dohr, junior research officer 
David Ebbs, senior research officer 
Sive Finlay, head of communications (former) 
sabelle Gémin, senior financial officer 
irjam Govaerts, public relations and events officer 
Gina Lamprell, junior publications officer 
Jennifer Ross, media and outreach officer 
Jasmin Schiffer, graphic designer 
Jan-Philipp Wagner, junior research officer 
Gillian Wilson, senior publications officer 
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Staff at IEA Hamburg 

Juliane Hencke, director IEA Hamburg 

Heiko Sibberns, director IEA Hamburg (former) 

Ralph Carstens, senior research advisor 

Sebastian Meyer, ICILS international data manager 

Michael Jung, ICILS international data manager (former) 
Ekaterina Mikheeva, ICILS deputy international data manager 
Lars Borchert, ICILS deputy international data manager (former) 
Sabine Meinck, head of research and analysis, and sampling units 
Sabine Tieck, research analyst (sampling) 

Sabine Weber, research analyst (sampling) 

Karsten Penon, research analyst (sampling) 

Duygu Savascl, research analyst (sampling) 

Oriana Mora, research analyst 

Adeoye Oyekan, research analyst 

Hannah Kohler, research analyst 

Lorelia Lerps, research analyst 

Rea Car, research analyst 

Clara Beyer, research analyst 

Yasin Afana, research analyst 

Guido Martin, head of coding unit 

Katharina Sedelmayr, research analyst (coding) 

Deepti Kalamadi, programmer 

aike Junod, programmer 

Limiao Duan, programmer 

Devi Prasath, programmer (former) 

Bettina Wietzorek, meeting and seminar coordinator 


RM Results 


RM Results was responsible for developing the software systems underpinning the computer- 
based student assessment instruments for the main survey. This work included development of 
the test and questionnaire items, the assessment delivery system, and the web-based translation, 
scoring, and data-management modules. 


RM Results 
ike Janic, managing director 
Stephen Birchall, deputy CEO 
Erhan Halil, product development manager 
Rakshit Shingala, team leader 
James Liu, analyst programmer 

ilupuli Lunuwila, analyst programmer 
Richard Feng, analyst programmer 
Stephen Ainley, quality assurance 
Ranil Weerasinghe, quality assurance 
Grigory Loskutov, IT coordinator 
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ICILS sampling referee 


Marc Joncas was the sampling referee for the study. He provided invaluable advice on all sampling- 
related aspects of the study. 


National research coordinators 


The national research coordinators played a crucial role in the development of the project. They 
provided policy- and content-oriented advice on the development of the instruments and were 
responsible for the implementation of ICILS in the participating countries. 


Chile 

Carolina Leyton 

Maria Victoria Martinez 

Tabita Nilo 

National Agency for Educational Quality 


Denmark 
Jeppe Bundsgaard 
Danish School of Education, Aarhus University 


Finland 
Kaisa Leino 
Finnish Institute for Educational Research, University of Jyvaskyla 


France 

Marion Le Cam 

Ministry of National Education 

Germany and North Rhine-Westphalia (Germany) 
Birgit Eickelmann 

Institute for Educational Science, University of Paderborn 


Italy 

Elisa Caponera 

Riccardo Pietracci 

INVALSI (Istituto Nazionale per la Valutazione del Sistema Educativo di Istruzione e di Formazione) 
Gemma De Sanctis (until May 2018) 

MIUR (Ministero dell’lstruzione, dell'Universita e della Ricerca) 

Kazakhstan 

Aigerim Zuyeva 

Ruslan Abrayev 

Department for International Comparative Studies, Ministry of Education and Science 


Luxembourg 
Catalina Lomos 
Luxembourg Institute of Socio-Economic Research (LISER) 


Moscow (Russian Federation) 
Elena Zozulia 
Moscow Center for Quality of Education 


Portugal 
Vanda Lourenco 
IAVE, |P—Institute of Educational Evaluation 


Republic of Korea 

Sangwook Park 

Kyongah Sang 

Korea Institute for Curriculum and Evaluation 
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United States 

Lydia Malley 

Linda Hamilton 

National Center for Education Statistics, US Department of Education 
Uruguay 

Cristobal Cobo 

Center for Research—Ceibal Foundation 


Cecilia Hughes 
Evaluation and Monitoring Department at Plan Ceibal 
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APPENDIX .B: 
Characteristics of national samples 


For each educational system participating in ICILS 2018, this appendix describes population 
coverage, exclusion categories, stratification variables, and any deviations from the general ICILS 
sampling design. 


The same sample of schools was selected for the student survey and the teacher survey. However, 
the school participation status of a school in the student and teacher survey can differ. It is 
particularly common that a school counts as participating in the student survey, but not in the 
teacher survey; however, the reverse scenario is also possible. If the school participation status 
in both parts of ICILS 2018 differs, the figures are displayed in two separate tables. If the status 
counts are identical in both parts, the results are displayed in one combined table. 


umbers in brackets refer to the number of categories of specific stratification variables. 


B.1 Chile 


e School level exclusions consisted of schools for children with special educational needs, very 
small schools (less than six students in the target grade) and geographically inaccessible 
schools. Within-school exclusions consisted of intellectually disabled students, functionally 
disabled students, and non-native language speakers. 


e Explicit stratification was performed by school type (grade 8 and 9, grade 8 only), school 
administration type (public, private-subsidized, private), and urbanization (rural, urban), 
resulting in 10 explicit strata. 


e Implicit stratification was applied by national assessment performance group for 
mathematics (four levels), giving a total of 36 implicit strata. 


e The sample was disproportionally allocated to explicit strata. 


e Schools were oversampled to allow for better estimates for private and rural schools. 


e Small schools were selected with equal probabilities. 


Table B.1.1: Allocation of student sample in Chile 


School participation status—student survey 


Explicit strata Total Ineligible Participating schools Non-participating 
sampled schools Sampled First Second schools 
schools replacement | replacement 

Grade 8 & 9 - Public - Rural 4 0 4 0 O 0 

Grade 8 & 9 - Public - Urban 10 0 9 1 O O 

Grade 8 & 9 - Private subsidized 4 0 4 0 O O 

- Rural 

Grade 8 & 9 - Private subsidized 20 O 17 2 1 0 

— Urban 

Grade 8 & 9 - Private - 46 0 38 3 5 O 

Urban & rural 

Grade 8 only - Public - Rural 30 O 29 1 O 0 

Grade 8 only - Public - Urban 28 O 27 1 0 0 

Grade 8 only - Private subsidized 12 O 12 0 O 0 

- Rural 

Grade 8 only - Private subsidized 22 al 21 0 O 0 

— Urban 

Grade 8 only - Private - Urban 4 O 2 1 O 1 

Total 180 ul 163 9 6 1 

Note: No schools with student participation rate below 50% were found. 
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Table B.1.2: Allocation of teacher sample in Chile 
School participation status—Teacher survey 
Explicit strata Total Ineligible Participating schools Non-participating 
sampled schools Sampled First Second schools 
schools replacement | replacement 
Grade 8 & 9 - Public - Rural 4 0 4 O O O 
Grade 8 & 9 - Public - Urban 10 0 9 O O 
Grade 8 & 9 - Private subsidized 4 0 4 O 0 O 
- Rural 
Grade 8 & 9 - Private subsidized 20 0 17 1 1 
~ Urban 
Grade 8 & 9 - Private - Urban & 46 0 37 3 5 
rura 
Grade 8 only - Public - Rural 30 O 28 1 O 
Grade 8 only - Public - Urban 28 O 2/ sD O O 
Grade 8 only - Private subsidized 12 O 12 0 O 0 
- Rural 
Grade 8 only - Private subsidized 22 il 21 0 O O 
- Urban 
Grade 8 only - Private - Urban 4 2 1 O i 
Total 180 al 161 6 
Note: Four schools with teacher participation rate below 50% were found. 


B.2 Denmark 


e School level exclusions consisted of schools for c 
centers, schools with less than f 
Waldorf schools. Within-school exclusions c 


five students | 


functionally disabled students, and non-native 


e No explicit stratification was per 


formed. 


hildren with special education needs, treatment 
n the target grade, and German, English, and 
onsisted of intellectually disabled students, 
anguage speakers. 


e Implicit stratification was applied by assessment score, giving a total of five implicit strata. 


e Small schools were selected with 


Table B.2.1: Allocation of student sample in Denmark 


equal probabi 


lities. 


School participation status—Student survey 


Explicit strata Total Ineligible Participating schools Non-participating 
sampled schools Sampled First Second schools 
schools replacement | replacement 

Denmark 150 O 114 25 4 7 

Total 150 ) 114 25 4 7 

Note: No schools with student participation rate below 50% were found. 
Table B.2.2: Allocation of teacher sample in Denmark 
School participation status—Teacher survey 

Explicit strata Total Ineligible Participating schools Non-participating 
sampled schools Sampled First Second schools 
schools replacement | replacement 

Denmark 150 O 109 25 4 12 

Total 150 (6) 109 25 4 12 


Note: Two schools were regarded as non-participating because the within-school participation rate was below 50%. 
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B.3 Finland 
e School level exclusions consisted of schools for children with special education needs and 
schools with instruction language not Finnish or Swedish. Within-school exclusions consisted 
of intellectually disabled students, functionally disabled students, and non-native language 
speakers. 
e Explicit stratification was performed by region (5), urbanization (urban, semi-urban, rural), 
and language (2), resulting in nine explicit strata. 
e Implicit stratification was applied by region (4), and urbanization (urban, semi-urban, rural), 
giving a total of 17 implicit strata. 
e School sample overlap between ICILS 2018 and TALIS 20181: Both samples were drawn at 
once using minimum overlap control. The samples were proportionally allocated to explicit 
strata. All schools have been selected with equal probabilities. 
Table B.3.1: Allocation of student sample in Finland 
School participation status—Student survey 
Explicit strata Total Ineligible Participating schools Non-participating | Excluded 
sampled | schools Sampled First Second schools schools 
schools replacement | replacement 
Helsinki/Uusimaa - Urban 36 O 35 0 O 1 0 
& semi-urban & rural 
Southern - Urban & 26 O 26 0 O O 0 
semi-urban 
Southern - Rural 6 O 6 0 O 0 
Western - Urban & 28 al 25 0 0 1 1 
semi-urban 
Western - Rural 6 O 5 0 O O 1 
Northern & Eastern - 28 al 26 1 O O 0 
Urban & semi-urban 
Northern & Eastern - 10 O 10 O O O 0 
Rura 
Swedish speaking — 8 O 8 0 O O 0 
No Aland 
Swedish speaking - Aland 2 O 2 0 O 0 
Total 150 2 143 1 0 2 2 
Note: No schools with student participation rate below 50% were found. 


1 ICILS 2018 was conducted in the same year as the OECD's Teaching and Learning International Survey (TALIS) 2018 
and Programme for International Student Assessment (PISA) 2018, and some national education surveys. The ICILS 
2018 sampling team collaborated closely with the staff implementing sampling for these studies to prevent school sam- 
ple overlap whenever possible. See Chapter 6 for further details. 


242 


Table B.3.2: Allocation of teacher sample in Finland 
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School participation status—Student survey 


Explicit strata Total Ineligible Participating schools Non-participating | Excluded 
sampled | schools Sampled First Second schools schools 
schools replacement | replacement 

Helsinki/Uusimaa - Urban 36 0 35 O 0 1 O 

& semi-urban & rural 

Southern - Urban & 26 0 26 O 0 0 O 

semi-urban 

Southern — Rural 6 0 6 0 0 0 O 

Western - Urban & 28 1 24 O 0 2 1 

semi-urban 

Western — Rural 6 0 Sy O 0 0 il 

Northern & Eastern - 28 il 26 1 O 0 O 

Urban & semi-urban 

Northern & Eastern - 10 0 10 O 0 0 O 

Rura 

Swedish speaking - 8 0 8 O 0 0 O 

No Aland 

Swedish speaking - Aland 2 eo) 2 O 0 O O 

Total 150 2 142 1 0 3 2 

Note: One school with teacher participation rate below 50% was found. 


B.4 France 


School level exclusions consisted of private schools without contract, schools in the overseas 


terri 
inte 
spea 
Exp 
pub 
10,0 
equi 


No implici 
School samp 
using mini 


ers. 


ic schoo 
OO to 200,000 i 


t stratifica 


mum over 


icit stratification 
priority education, 


pment, normal digital equi 


e overlap betwee 


nhabitants, 


ap contro 


functionally disabled s 


was performed by school type (pu 
private school), urbani 
more than 200,000 inh 
pment), resulting in 18 explicit strata. 


tion was applied. 


n ICILS 2018 and TALIS 2018: ICILS sample was selected 
to TALIS 2018. 


tories and in Mayotte, and specialized schools. Within-schoo 
ectually disabled students, 


tudents, and non-na 


blic schoo 
zation (less than 10,000 inhabitants, 
abitants), and equipment (large digital 


exclusions consisted of 
tive language 


on priority education, 


Schools with large digital equipment strata were oversampled to allow for better estimates. 


Small schools were selected with equal probabilities. 
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Table B.4.1: Allocation of student sample in France 


School participation status—Student survey 


Explicit strata Total Ineligible Participating schools Non-participating 
sampled schools Sampled First Second schools 
schools replacement | replacement 

Public school - Non priority 9 O 9 O O 0 


education - Less than 10,000 
inhabitants - Large digital 


equipment 

Public school - Non priority 24 O 24 O 0 0 
education — Less than 10,000 

inhabitants - Normal digital 

equipment 

Public school - Non priority 8 O 8 O O O 
education - 10,000 to 200,000 

inhabitants - Large digital 

equipment 

Public school - Non priority 18 O 17 1 0 0 


education - 10,000 to 200,000 


inhabitants - Normal digital 

equipment 

Public school —- Non priority 12 O 12 0 O O 
education - More than 200,000 

inhabitants - Large digital 

equipment 

Public school - Non priority 19 O 19 O O 0 


education - More than 200,000 


inhabitants - Normal digital 
equipment 
Public school - Priority education 4 O 4 O O 0 


— Less than 10,000 inhabitants - 
Large digital equipment 


Public school - Priority education 4 O 4 O O 0 
— Less than 10,000 inhabitants - 
ormal digital equipment 


Public school - Priority education 4 O 4 0 O 0 
— 10,000 to 200,000 inhabitants 
- Large digital equipment 


Public school - Priority education 6 O 6 O O 0 
— 10,000 to 200,000 inhabitants 
ormal digital equipment 


Public school - Priority education 5 O > 0 O O 
ore than 200,000 inhabitants 

- Large digital equipment 

Public school - Priority education 8 O 8 O 0 0 

— More than 200,000 inhabitants 

- Normal digital equipment 


Private school - Less than 10,000 4 O 4 O O 0 
inhabitants - Large digital 

equipment 

Private school — Less than 10,000 6 0 6 O 0 0 
inhabitants —- Normal digita 

equipment 

Private school - 10,000 to 4 O 4 O 0 0 
200,000 inhabitants - Large 

digital equipment 
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Table B.4.1: Allocation of student sample in France (contd.) 


School participation status—Student survey 


Explicit strata Total Ineligible Participating schools Non-participating 
sampled schools Sampled First Second schools 
schools replacement | replacement 

Private school - 10,000 to 8 O 8 0) 6) O 


200,000 inhabitants — normal 
digital equipment 
Private school - More than 4 O 4 O O O 
200,000 inhabitants - Large 
digital equipment 
Private school - More than 9 O 9 0 O O 
200,000 inhabitants - Normal 
digital equipment 
Total 156 0) 155 1 0 0 


Note: No schools with student participation rate below 50% were found. 


Table B.4.2: Allocation of teacher sample in France 


School participation status—Teacher survey 


Explicit strata Total Ineligible Participating schools Non-participating 
sampled schools Sampled First Second schools 
schools replacement | replacement 

Public school — Non priority 9 0 6 O 0 3 


education - Less than 10,000 
inhabitants - Large digita 
equipment 


Public school — Non priority 24 0 21 O O 3 
education - Less than 10,000 
inhabitants - Normal digital 
equipment 


Public school - Non priority 8 0 8 0 O O 
education - 10,000 to 200,000 
inhabitants - Large digita 
equipment 

Public school - Non priority 18 0 15 0 O 3 
education - 10,000 to 200,000 
inhabitants - Normal digital 
equipment 


Public school — Non priority 12 1) 8 O O 4 
education - More than 200,000 
inhabitants - Large digita 
equipment 


Public school — Non priority 19 0 14 O O 5 
education - More than 200,000 
inhabitants - Normal digital 
equipment 


Public school - Priority education 4 0 3 O O 1 
— Less than 10,000 inhabitants - 
Large digital equipment 


Public school - Priority education 4 O 3 O O 1 
— Less than 10,000 inhabitants - 
ormal digital equipment 


Public school — Priority education 4 0 2 O O 2 
— 10,000 to 200,000 inhabitants 
— Large digital equipment 
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Table B.4.2: Allocation of teacher sample in France (contd.) 
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School participation status—Teacher survey 


Explicit strata Total Ineligible Participating schools Non-participating 
sampled schools Sampled First Second schools 
schools replacement | replacement 

Public school - Priority education 6 O 4 O 0 2 

— 10,000 to 200,000 inhabitants 

- Normal digital equipment 

Public school - Priority education PB} O 4 0 O 1 

ore than 200,000 inhabitants 

- Large digital equipment 

Public school - Priority education 8 O 3 0 O 5 

-— More than 200,000 inhabitants 

- Normal digital equipmen 

Private school - Less than 10,000 4 O 4 O 0 0) 

inhabitants - Large digital 

equipme 

Private school - Less than 10,000 6 0 5 0 O 4 

nhabitants - Normal digital 

equipme 

Private school - 10,000 to 4 O 4 O O 0 

200,000 inhabitants - Large 

digital equipmen 

Private school - 10,000 to 8 O 7 O 0 i 

200,000 inhabitants - normal 

digital equipmen 

Private schoo ore than 4 O 3 O 0) 1 

200,000 inhabitants - Large 

digital equipmen 

Private schoo ore than 9 0 8 0 0 1 

200,000 inhabitants - Normal 

digital equipmen 

Total 156 6) 122 0) 0 34 

Note: Thirty-four schools were regarded as non-participating because the within-school participation rate was below 50%. 
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Table B.5.1: Allocation of st 
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evel exclusions consisted of special education schools and very small schools (less than 
inthe target grade). Within-school exclusions consisted of intellectually disabled 
ionally disabled students, and non-native language speakers. 


ed by federal state (North Rhine- 


aced ina separate stratum, resulting 


ioeconomic status predictor (3 levels) 
s, implicit stratification was done by 
evels), giving a total of 65 implicit strata. 
rlap between ICILS 2018, PISA 2018, and national Assessment 
Bildungstrend 2018): ICILS sample was selected using minimum overlap 


Educational 


e Schools in North Rhine-Westphalia were oversampled due to the benchmark characteristic. 


e Small schools were selected with equal probabilities. 


dent sample in Germany 


School participation status—Student survey 


Explicit strata Total Ineligible Participating schools Non-participating | Excluded 
sampled | schools Sampled First Second schools schools 
schools replacement | replacement 

North Rhine-Westphalia - 42 O 38 4 O O 0 

Gymnasium 

North Rhine-Westphalia - 72 3 65 O 3 O 

Non-Gymnasium 

Other federal states - 44 O ov 2 4 O 

Gymnasium 

Other federal states - 72 0 51 4 3 13 1 

Non-Gymnasium 

Special education schools 4 1 2 0 O 0 

- None 

Total 234 4 193 11 5 20 1 

Note: Six schools were regarded as non-participating because the within-school participation rate was below 50%. 
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Table B.5.2: Allocation of teacher sample in Germany 


School participation status—Teacher survey 


Explicit strata Total Ineligible Participating schools Non-participating | Excluded 
sampled | schools Sampled First Second schools schools 
schools replacement | replacement 

North Rhine-Westphalia - 42 O 36 4 0 2 0 

Gymnasium 

North Rhine-Westphalia - 72 3 65 4. O 3 O 

Non-Gymnasium 

Other federal states - 44 O 25 1 1 17 0 

Gymnasium 

Other federal states - 72 O At 4 2 24 1 

Non-Gymnasium 

Special education schools 4 1 2 0 0 1 0 

- None 

Total 234 4 169 10 3 47 1 


Note: Thirty-three schools were regarded as non-participating because the within-school participation rate was below 50%. 


B.6 Italy 


e School level exclusions consisted of schools for children with special needs, schools with less 
than six students in the target grade, schools with Slovenian instruction language, and schools 
in remote areas or on little islands. Within-school exclusions consisted of functionally disabled 
students. 


e Explicit stratification was performed by geographic region (North, Central, South). 


e |mplicit stratification was applied by school type (public, private) and performance group (5), 
giving a total of 30 implicit strata. 


e Small schools were selected with equal probabilities. 


Table B.6.1: Allocation of student sample in Italy 


School participation status—Student survey 


Explicit strata Total Ineligible Participating schools Non-participating 
sampled schools Sampled First Second schools 
schools replacement | replacement 

North 66 O 66 O O O 

Central 28 O 24 4 O O 

South 56 O 53 3 O O 

Total 150 0) 143 7 0) O 

Note: No schools with student participation rate below 50% were found. 
Table B.6.2: Allocation of teacher sample in Italy 
School participation status—Teacher survey 

Explicit strata Total Ineligible Participating schools Non-participating 
sampled schools Sampled First Second schools 
schools replacement | replacement 

North 66 O 66 O O O 

Central 28 0 24 4 0 0 

South 56 O 51 3 O 2 

Total 150 0 141 7 0) 2 


Note: Two schools were regarded as non-participating because the within-school participation rate was below 50%. 
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B.7 Kazakhstan 


e School level exclusions consisted of schools for children with special needs, schools with less 
than five students in the target grade, Uighur schools, Uzbek schools, Tadjik schools, and other 
anguage schools. Within-school exclusions consisted of intellectually disabled students, 
functionally disabled students, and non-native language speakers. 


e Explicit stratification was performed by urbanization (urban, rural) and language of instruction 
4), resulting in eight explicit strata. 


e Noimplicit stratification was applied. 
e The sample was disproportionally allocated to explicit strata. 
e Schools were oversampled to allow for better estimates for the different language groups. 


e Small schools were selected with equal probabilities. 


Table B.7.1: Allocation of student sample in Kazakhstan 


School participation status—Student survey 


Explicit strata Total Ineligible Participating schools Non-participating 
sampled schools Sampled First Second schools 
schools replacement | replacement 

Urban - Kazakh only 30 O 30 0 O 0 

Urban - Russian only 20 O 20 0 O O 

Urban - Kazakh & Russian 30 0 30 O 0 O 

Urban - Other 14 1 13 0 0 O 

Rural - Kazakh only 36 O 36 0 O 0 

Rural — Russian only 10 0 10 0 O O 

Rural - Kazakh & Russian 30 0) 29 O 0 al 

Rural - Other 16 1 15 0 0 O 

Total 186 2 183 0 0) 1 

Note: One school with student participation rate below 50% was found. 


Table B.7.2: Allocation of teacher sample in Kazakhstan 


School participation status—Teacher survey 


Explicit strata Total Ineligible Participating schools Non-participating 
sampled schools Sampled First Second schools 
schools replacement | replacement 

Urban - Kazakh only 30 ie) 30 O 0 O 

Urban - Russian only 20 0 20 O O O 

Urban - Kazakh & Russian 30 0 30 O 0 O 

Urban - Other 14 i). 13 O 0 O 

Rural - Kazakh only 36 0 36 O O O 

Rural — Russian only 10 0 10 O O O 

Rural - Kazakh & Russian 30 O 30 0 0 O 

Rural - Other 16 il 15 O O O 

Total 186 2 184 0) (6) (6) 

Note: No schools with teacher participation rate below 50% were found. 
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for better estimates for rural, boys, and girls schools. 


Small schools were selected with equal probabilities. 


Table B.8.1: Allocation of student sample in Korea 
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School participation status—Student survey 


Explicit strata Total Ineligible Participating schools Non-participating 
sampled schools Sampled First Second schools 
schools replacement | replacement 

Urban - Boys 10 0 10 0 O O 

Urban - Girls 10 O 10 O O 0 

Urban - Mixed 34 0 34 0 0 0 

Suburban - Boys 12 O 12 0 O O 

Suburban - Girls 12 0 112 0 O 0 

Suburban - Mixed 38 O 38 0 O O 

Rural - Boys O 8 0 O O 

Rural - Girls 0 8 O O 0 

Rural - Mixed 18 O 18 O O O 

Total 150 0 150 O 0 6) 

Note: No schools with student participation rate below 50% were found. 
Table B.8.2: Allocation of teacher sample in Korea 
School participation status—Teacher survey 

Explicit strata Total Ineligible Participating schools Non-participating 
sampled schools Sampled First Second schools 
schools replacement | replacement 

Urban - Boys 10 0 10 O O O 

Urban - Girls 10 O 10 O O O 

Urban - Mixed 34 O 34 O O O 

Suburban - Boys 12 O 11 O 0 i 

Suburban - Girls 12 O 12 0 0 0 

Suburban - Mixed 38 O 36 0 O 2 

Rural - Boys O O 0 O 

Rural - Girls 8 O 8 O O 0 

Rural - Mixed 18 O 18 O O O 

Total 150 0) 147 6) 0 3 

Note: No schools with teacher participation rate below 50% were found. 
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B.9 Luxembourg 


e Noschool-level exclusions. Within-school exclusions consisted of intellectually disabled students, 
functionally disabled students, and non-native language speakers. 


e Explicit stratification was performed by curriculum (school following national curriculum, schools 
following different curriculum), resulting in two explicit strata. 


e Noimplicit stratification was applied. 


e Census of all schools and students. All variance estimates were computed using schools as 
variance strata. 


Table B.9.1: Allocation of student sample in Luxembourg 


School participation status—Student survey 


Explicit strata Total Ineligible Participating schools Non-participating 
sampled schools Sampled First Second schools 
schools replacement | replacement 

Schools following national 33 0 32 0 O ul 

curriculum 

Schools with different curriculum 8 0 6 O 0 2 

Total 41 0 38 (0) 0 3 

Note: No schools with student participation rate below 50% were found. 
Table B.9.2: Allocation of teacher sample in Luxembourg 
School participation status—Teacher survey 

Explicit strata Total Ineligible Participating schools Non-participating 
sampled schools Sampled First Second schools 
schools replacement | replacement 

Schools following national 33 O 23 0 O 10 

curriculum 

Schools with different curriculum 8 0 5 0 0 3 

Total 41 0 28 0 ) 13 

Note: Ten schools were regarded a non-participating because the within-school participation rate was below 50%. 


B.10 Portugal 


e School-level exclusions consisted of schools with less than seven students in the target 
grade, and international schools. Within-school exclusions consisted of intellectually disabled 
students, functionally disabled students, and non-native language speakers. 


e Explicit stratification was implemented by school type (public, private) and subregions (23), 
resulting in 28 explicit strata. 


e Noimplicit stratification was applied. 
e Small school samples within regions necessitated disproportional sample allocations. 


e Small schools were selected with equal probabilities. 
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Table B.10.1: Allocation of student sample in Portugal 
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School participation status—Student survey 


Explicit strata 


Total 
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Private - Other Regions 
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1 189 


11 


19 


Note 
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ow 50%. 
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Table B.10.2: Allocation of teacher sample in Portugal 


School participation status—Teacher survey 


Explicit strata Total Ineligible Participating schools Non-participating 
sampled schools Sampled First Second schools 
schools replacement | replacement 

Public - Alto Minho 6 O 6 O 0 O 

Public - Cavaco 6 0 O 0 O 

Public - Ave 6 0 6 O 0 O 

Public - Area Metropolitana do 18 0 17 0 0 

Porto 

Public - Alto Tamega 6 0 6 O 0 O 

Public - Tamega e Sousa 6 0 i) O O 1 

Public - Douro 6 0 6 O 0 0 

Public - Terras de Tras-os-Montes 6 0 6 O 0 0 

Public - Oeste 6 0 =) O 0 1 

Public - Regido de Aveiro 6 0 6 O O O 

Public - Regido de Coimbra 6 O 5 1 0 O 

Public — Regido de Leiria 6 0 6 0 0 O 

Public - Viseu Dao Lafoes 6 0 6 O 0 O 

Public - Beira Baixa 6 0 6 O 0 0 

Public - Médio Tejo 6 0 5 1 0 O 

Public - Beiras e Serra da Estrela 6 O 5 1 0 O 

Public - Area Metropolitana de 28 0 26 O O 2 

Lisboa 

Public — Alentejo Litoral 6 0 5 O 0 1 

Public - Baixo Alentejo 6 0 6 0 O O 

Public - Leziria do Tejo 6 0 6 0 0 O 

Public - Alto Alentejo 6 0 6 0 0 O 

Public —- Alentejo Central 6 0 6 O O O 

Public - Algarve 6 0 6 O 0 O 

Public - Regido Aut6énoma dos 6 0 4 4 0 i 

Acores 

Public - Regiao Aut6énoma da 6 0 6 0 0 O 

Madeira 

Private - Area Metropolitana do 7 0 5 2 0 O 

Porto 

Private - Area Metropolitana de 7 0 6 a 0 O 

Lisboa 

Private - Other Regions 22 1 13 4 0 4 

Total 220 1 197 11 6) 11 

Note: Seven schools were regarded as non-participating because the within-school participation rate was below 50%. 
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B.11 United States 


e School-level exclusions consisted of schools with less than two classes in the target grade 
and private schools. Within-school exclusions consisted of intellectually disabled students, 
functionally disabled students, and non-native language speakers. 


e Explicit stratification was performed by poverty level (2), school type (public, private), and 
geographic regions (Northeast, Midwest, South, West), resulting in 12 explicit strata. 


e |mplicit stratification was applied by school location (city, rural, suburban, town), and ethnicity 
status, giving a total of 96 implicit strata. 


e Small schools were selected with equal probabilities. 


Table B.11.1: Allocation of student sample in the United States 


School participation status—Student survey 


Explicit strata Total Ineligible Participating schools Non-participating | Excluded 
sampled | schools Sampled First Second schools schools 
schools replacement | replacement 

igh poverty — Public - 17 O 6 1 0 10 O 

ortheast 

igh poverty — Public - 25 il 18 0 0 6 O 

idwest 

igh poverty — Public - 68 1 58 3 2 4 0 
Sout 

igh poverty — Public - 42 1 35 4 0 2 O 

Wes 

Low poverty — Private - 7 O 1 2 O 4 O 

ortheast 

Low poverty -- Private - 7 0 4 1 O 2 0 

idwest 

Low poverty — Private - 10 1. 5 O 0 4 0 

Sout 

Low poverty - Private - 6 1 i O 1 3 0 

Wes 

Low poverty — Public - 34 O 12 2 1 19 0 

ortheast 

Low poverty — Public - 43 1 24 ss) 2 11 O 

idwest 

Low poverty — Public - 56 4. 40 5 dl 8 HE 

Sout 

Low poverty — Public - 37 i. 27 2 0 6 iE 

Wes 

Total 352 8 231 25. 7 79 2 

Note: Three schools were regarded as non-participating because the within-school participation rate was below 50%. 
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Table B.11.2: Allocation of teacher sample in the United States 


School participation status—Teacher survey 


Explicit strata Total Ineligible Participating schools Non-participating | Excluded 
sampled | schools Sampled First Second schools schools 
schools replacement | replacement 

igh poverty — Public - 17 0 5 1 O 11 O 

ortheast 

igh poverty — Public - 25 al 17 O 0 fs O 

idwest 

igh poverty — Public - 68 1 57 3 2 5 O 
South 

igh poverty — Public - 42 1 33 4 0 4 O 

Wes 

Low poverty - Private - 7 O A 2 1 3 O 

ortheas 

Low poverty -- Private - 7 0 4 1 O 2 O 

idwest 
ow poverty - Private - 10 1 5 O 0 4 O 

South 

Low poverty - Private - 6 1 1 O 1 3 O 

Wes 

Low poverty — Public - 34 0 12 2 i 19 O 

ortheas 

Low poverty — Public - 43 i 25 5 2 10 O 

idwest 
ow poverty - Public - 56 1 4O 5 1 8 1 

South 

Low poverty — Public - 37 1 26 2 O 7 1 

Wes 

Total 352 8 226 25 8 83 2 


Note: Seven schools were regarded as non-participating because the within-school participation rate was below 50%. 


B.12 Uruguay 
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o implicit stra 
The 
fo 
wel 


sample was 
better estim 
as Lyceum 


To enable overl 
pro 


Explicit stratification was per 


, and school ty 


ted of schools for children with special needs and rural schools. 
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ap control for P 
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re selected with equal probabilities. 


tification was applied. 

disproportionally allocated to explicit strata. Schools were oversampled to allow 
ates for public and private schools, Montevideo and Interior Urban schools, as 
high school) and Utu schools (vocational schools). 


SA 2018, schools in one stratum were selected with equal 
rata the selection probabilities have been capped to 0.5. 


estimates were computed using schools as variance strata. 
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Table B.12.1: Allocation of student sample in Uruguay 


255 


School participation status—Student survey 


Explicit strata Total Ineligible Participating schools Non-participating 
sampled schools Sampled First Second schools 
schools replacement | replacement 

Public - Montevideo - Lyceum 30 1 23 3 O 3 

Public - Montevideo - Utu 27 1 25 0 0 1 

Public - Interior urban - Lyceum 60 1 57 al O 1 

Public - Interior urban - Utu 30 1 28 O 0 1 

Private - Montevideo — Lyceum 18 O 15 3 O 0 

Private - Interior urban - Lyceum 12 1 11 0 O O 

Total 177 5 159 7 0 6 


Note: Six schools were regarded as 


on-participating because the within-school participation ra 


Table B.12.2: Allocation of teacher sample in Uruguay 


e was below 50%. 


School participation status—Teacher survey 


Explicit strata Total Ineligible Participating schools Non-participating 
sampled schools Sampled First Second schools 
schools replacement | replacement 

Public - Montevideo - Lyceum 30 1 18 2 O 9 

Public - Montevideo - Utu 27 1 11 O 0 15 

Public - Interior urban - Lyceum 60 1 42 dl 0 16 

Public - Interior urban - Utu 30 1 24 O 0 

Private - Montevideo - Lyceum 18 O 12 3 O 

Private - Interior urban - Lyceum 12 1 8 0 O 

Total 177 5 115 6 0 51 


Note: Fifty-one schools were regarded as non-participating because the within-school participa 


Benchmarking participants 


B.13 Moscow, Russian Federation 


ion rate was below 50%. 


e School-level exclusions consisted of schools with less than seven students. Within-school 
exclusions consisted of intellectually disabled students, functionally disabled students, and 


non-native language speakers. 


e Explicit stratification was performed by school performance. 


e Implicit stratification was applied by school type (public, private), giving a total of 27 implicit 


strata. 


e In first two explicit strata schools were selected with equal probabilities. 


Small schools were selected with equal probabilities. 
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Table B.13.1: Allocation of student and teacher sample in Moscow, Russian Federation 
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School participation status—Student and teacher survey 


Explicit strata Total Ineligible Participating schools Non-participating 
sampled schools Sampled First Second schools 
schools replacement | replacement 

Top 1-50 schools 26 0 26 O O 0 

Top 51-100 schools 26 0 26 O 0 0 

Top 101-200 schools 30 O 30 O O O 

Top 201-300 schools 30 O 30 O O O 

Top 301 and above-all other 38 O 36 1 ‘I 0 

schools 

Total 150 0) 148 1 1 O 

Note: No schools with student participation rate below 50% were found. No schools with teacher participation rate below 50% were found. 


B.14 North Rhine-Westphalia, Germany 


School-level exclusions consisted of special education schools and very small schools (less than 


three students in the 


Explicit stratification 
Gymnasium). Specia 
separate stratum, resulting in 


for regu 


Implicit stratification for regu 


target grade). Withi 
students, functionally disabled students, 


levels), giving a total of four im 


School sample overlap between ICILS 2018, PISA 2018, and national Assessment 


lar schoo 


lar schoo 
plicit strata. 


n-school exclusions consisted of intellectually disabled 
and non-native language speakers. 


s was performed by school type (Gymnasium, non- 
education schools with students able to do the test were placed in a 
three explicit strata. 


s was applied by socioeconomic status predictor (3 


Educational 


Standards 2018: ICILS sample was selected using minimum overlap control to both surveys. 


Small schools were selected with equal probabilities. 


Table B.14.1: Allocation of student sample in North Rhine-Westphalia, Germany 


School participation status—Student survey 


Explicit strata Total Ineligible Participating schools Non-participating 
sampled schools Sampled First Second schools 
schools replacement | replacement 
Gymnasium 42 O 38 4 0 0 
Non-Gymnasium 72 3 65 1 O 3 
Special education schools al 0 af 0 0 O 
Total 115 3 104 5 (0) 3 
Note: No schools with student participation rate below 50% were found. 
Table B.14.2: Allocation of teacher sample in North Rhine-Westphalia, Germany 
School participation status—Teacher survey 
Explicit strata Total Ineligible Participating schools Non-participating 
sampled schools Sampled First Second schools 
schools replacement | replacement 
Gymnasium 42 0 36 4 O 2 
Non-Gymnasium 72 3 65 1 O 3 
Special education schools il 0 1 O O O 
Total 115 3 102 5 0 5 
Note: Three schools were regarded as non-participating because the within-school participation rate was below 50%. 
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National items excluded from scaling 
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IEA’s International Computer and Information Literacy Study (ICILS) 2018 investigated 
how well students are prepared for study, work, and life in a digital world. ICILS 2018 
measured international differences in students’ computer and information literacy (CIL): 
their ability to use computers to investigate, create, participate, and communicate at 
home, at school, in the workplace, and in the community. Participating countries had 
an additional option for their students to complete an assessment of computational 
thinking (CT): their ability to recognize aspects of real-world problems appropriate for 
computational formulation, and to evaluate and develop algorithmic solutions to those 
problems, so that the solutions could be operationalized with a computer. 


This technical report follows the publication of several international and regional 
reports that presented the results of ICILS 2018. It provides a comprehensive account 
of the conceptual, methodological, and analytical implementation of the study. It 
includes detailed information on the development of the data-collection instruments 
used, including their translation and translation verification, on sampling design and 
implementation, sampling weights and participation rates, survey operation procedures, 
quality control of data collection, data management and creation of the international 
database, scaling procedures, and analysis of ICILS 2018 data. The technical report 
enables researchers to evaluate published reports and articles based on data from this 
study and, used in conjunction with the ICILS 2018 User Guide for the International 
Database, will provide guidance for their own analyses. 
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